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ABSTRACT 

The  objective  of  this  research  has  been  the  creation  of  a  hardware  design  for  a 
Predictive  Read  Cache  (PRC).  The  PRC  is  a  developmental  cache  intended  to  replace 
second-level  caches  common  in  modern  microprocessor  systems.  The  PRC  has  the  potential 
of  being  faster  and  cheaper  than  current  second-level  caches  and  is  distinctive  in  its  ability  to 
predict  data  addresses  to  be  referenced  by  a  central  processing  unit. 

Previous  research  has  analyzed  the  behavior  that  the  PRC  must  exhibit.  During  the 
described  research,  the  behavior  was  modeled  in  the  Verilog  hardware  description  language. 
Verilog-XL  was  used  for  simulation,  which  uses  the  Verilog  behavioral  model  as  input.  The 
behavioral  model  suggests  that  the  internal  structure  of  the  PRC  could  be  divided  into  six 
modules,  each  performing  part  of  the  function  of  the  whole  PRC.  Each  of  these  blocks  was 
studied  for  hardware  equivalents,  easing  the  development  of  the  total  structural  model. 

Using  Verilog  structural  models  as  input,  Epoch  was  used  to  automatically  perform  a 
very  large-scale  integrated  (VLSI)  circuit  layout  and  to  generate  timing  information.  The 
Epoch  output  files  are  used  for  further  simulation  with  Verilog-XL  to  identify  critical  parts 
of  the  design.  The  result  of  this  research  is  a  complete  hardware  design  for  the  PRC. 
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I .   INTRODUCTION 


A.  HISTORY 

Billingsley  and  Fouts  demonstrated  the  viability  of  using 
an  address  predicting  buffer  to  reduce  memory  latency  in 
computer  systems.  "The  implementation  of  a  MPB  [Memory 
Prediction  Buffer]  is  less  expensive  than  a  next-level  cache 
and  delivers  a  comparable  performance  enhancement." 
(Billingsley,  1992) 

With  this  in  mind,  Nowicki  designed  a  Read  Prediction 
Buffer  (RPB)  as  part  of  his  thesis  work  in  1992  (Nowicki, 
1992).  This  RPB  was  capable  of  prefetching  data  based  on  the 
previous  pattern  of  memory  accesses.  Continuing  the  work  of 
Nowicki,  Aguilar  tested  that  design  and  suggested  several 
enhancements  to  improve  it  (Aguilar,  1995)  .  A  tentative 
design  of  this  new  Predictive  Read  Cache  (PRO  was  a  part  of 
his  thesis  work. 

Aguilar  proposed  a  design  consisting  of  six  modules  which 
together  would  comprise  the  PRC .  He  designed  four  of  those 
six  modules,  testing  each  independently,  but  not  together. 

B.  PRINCIPLE  OF  OPERATION 

The  Predictive  Read  Cache  stores  data  only,  not 
instructions.  The  design  is  based  on  a  couple  of  observations 
about  data  fetches  from  main  memory.  First,  within  a 
specific  block  of  data,  the  accesses  often  occur  in  sequential 
patterns  such  as  every  element  in  order,   or  every  other 


element  in  reverse  order.   The  second  observation  is  that  a 
program  often  uses  several  blocks  of  data  concurrently. 

The  PRC  takes  advantage  of  the  access  patterns  to  predict 
future  memory  access  addresses.  The  prediction  is  based  on  a 
linear  displacement  of  the  addresses .  The  PRC  calculates  the 
difference  between  two  given  addresses,  then  adds  the 
difference  to  the  most  recent  address  to  arrive  at  the 
predicted  address.  For  example,  if  the  Central  Processing 
Unit  (CPU)  accesses  the  data  at  address  20h  (hexadecimal  20) 
and  then  at  address  40h,  the  PRC  predicts  that  the  CPU  soon 
will  need  the  data  at  60h.  Once  the  PRC  has  predicted  an 
address,  it  fetches  the  data  from  that  address.  Once  the  data 
is  stored  in  the  PRC,  the  PRC  can  deliver  that  data  to  the  CPU 
much  more  quickly  than  the  main  memory  could  deliver  the  data. 

The  PRC  handles  multiple  data  blocks  through  its  "lines." 
Each  line  is  capable  of  tracking  the  pattern  of  accesses 
within  a  unique  block  of  data.  Thus,  the  PRC  can  track  only 
as  many  access  patterns  as  it  has  lines. 

When  the  cache  is  full  and  a  new  access  pattern  begins, 
a  line  has  to  be  replaced.   Lines  that  have  not  been  used 
recently  become  aged.   Aged  lines  are  the  first  to  be  replaced 
when  the  cache  is  full. 

Data  incoherency  is  avoided  through  the  process  of 
flushing  lines.  When  a  line  is  flushed,  that  line  is  marked 
as  containing  invalid  data  and  is  made  available  for  tracking 
new  access  patterns.  If  the  CPU  writes  data  to  an  address  from 
which  the  PRC  has  prefetched  data,  the  PRC  flushes  the  line 
with  that  data. 


C.  RESEARCH  GOALS 

The  objective  of  this  research  is  to  create  a  complete 
hardware  design  of  the  PRC .  Completing  the  design  has 
priority  over  the  performance,  though  the  performance  must  be 
better  than  the  performance  of  main  memory  for  this  design  to 
be  of  any  value. 

The  performance  is  measured  in  terms  of  the  rate  at  which 
the  Central  Processing  Unit  (CPU)  can  access  the  data  in  the 
PRC.  In  the  microprocessor  system  for  which  this  PRC  design 
is  created,  data  accesses  occur  in  groups.  The  groups  are 
called  "bursts."  Each  access  within  a  burst  is  called  a 
"beat."  With  a  60-ns  memory  and  a  66-MHZ  system  clock,  the 
four-beat  burst  operation  takes  8-3-3-3  cycles,  that  is,  eight 
cycles  for  the  first  beat  and  three  more  cycles  for  each  of 
the  three  remaining  beats.  The  design  of  the  PRC  must  perform 
at  least  this  well  and  preferably  much  faster. 

D.  THESIS  STRUCTURE 

The  Testbench  is  presented  first,  which  is  the  Verilog 
model  of  the  environment  in  which  the  PRC  is  expected  to 
operate.  This  description  includes  a  summary  of  the  bus 
protocol  and  results  of  tests  that  show  the  correct 
performance  of  the  Testbench. 

The  description  of  the  behavioral  model  design  phase  is 
presented  next .  This  chapter  presents  a  simple  psuedocode 
model  of  the  PRC  which  is  used  to  develop  an  appropriate  data 
structure  and  block  diagram  for  the  PRC.  The  individual 
blocks  are  each  modeled  with  Verilog  and  then  connected 


together  in  the  Testbench  to  verify  that  the  entire  PRC  works 
as  desired. 

Once  the  behavioral  model  design  phase  is  complete,  each 
block  is  converted  into  a  hardware  (structural)  model.  This 
phase  of  the  design  is  detailed  in  Chapter  IV. 

This  thesis  also  contains  a  description  of  the  Computer 
Aided  Design  (CAD)  tools  used  for  this  research.  The 
descriptions  include  tips  for  making  their  use  easier  and 
descriptions  of  any  problems  encountered. 


II.  TESTBENCH 


This  chapter  describes  the  Testbench,  the  environment  in 
which  the  Predictive  Read  Cache  (PRO  was  designed  to  operate. 
In  particular,  it  summarizes  the  bus  arbitration  protocol  and 
explains  the  important  aspects  of  each  part  of  the  Testbench. 
The  chapter  concludes  with  the  test  results  of  the  Testbench 
itself. 

A.     OVERVIEW  OF  TESTBENCH 

The  Testbench  models  and  simulates  the  environment  in 
which  the  PRC  design  was  tested.  As  indicated  in  Figure  1,  it 
comprises  four  blocks,  one  of  which  is  the  PRC  itself.  The 
Testbench  was  developed  with  Verilog  behavioral  models.  The 
CPU  module  simulates  various  functions  of  a  PowerPC-603  .  The 
Memory  module  simulates  the  behavior  of  a  60 -ns  dynamic  random 
access  memory  (DRAM) .  The  Arbiter  controls  access  to  both  the 
address  and  data  busses.  Each  of  these  modules  is  described 
in  more  detail  in  the  following  sections,  after  a  description 
of  the  PowerPC-603  bus  protocol. 
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Figure  1.  Block  Diagram  of  Testbench. 


There  were  four  major  decisions  made  regarding  the  design 
of  the  Testbench.  The  first  decision  was  to  use  a  PowerPC-603 
microprocessor  system  as  the  environment  in  which  this  PRC 
will  operate.  The  work  of  Aguilar  was  started  using  the  '603 
(Aguilar,  1995)  .  It  is  still  a  current  member  of  the  PowerPC 
family;  the  protocol  should  not  be  out  of  date  for  quite  some 
time . 

The  second  design  decision  was  to  limit  the  '603  to  in- 
order  transactions.  The  '603  is  capable  of  performing  certain 
sequences  of  data  transfers  out  of  order.  That  is,  the  order 
of  the  data  bus  cycles  can  be  different  from  the  order  of  the 
address  bus  cycles.  Prohibiting  these  transactions  made  the 
CPU  model  simpler  and  simplified  the  design  of  the  PRC.  This 
did  not  undermine  the  demonstration  of  the  PRC  as  a  viable 
memory  management  tool . 

The  third  design  decision  was  to  use  a  66-MHZ  system  bus 
and  CPU  clock  rate.   Sixty- six-MHZ  is  a  reasonably  fast  system 


bus  speed.   Designing  for  a  slower  bus  speed  could  severely 
reduce  the  applicability  of  this  design  to  modern  systems. 

The  fourth  decision  was  to  use  the  64-bit  data  bus  vice 
the  optional  32 -bit  configuration.  When  configured  with  the 
64-bit  data  bus,  the  PowerPC-603  can  access  memory  in  one  of 
two  modes:  single-beat  or  four-beat  burst.  A  single  beat  is 
one  memory  access  of  one  to  eight  bytes.  A  four-beat  burst  is 
a  sequence  of  four  sequential  memory  accesses,  eight  bytes  per 
beat  totaling  32  bytes.  When  configured  with  the  32-bit  data 
bus,  the  '603  can  access  memory  in  one  of  three  modes:  single- 
beat  (one  to  four  bytes),  two-beat  burst  (eight  bytes),  or 
eight-beat  burst  (32  bytes) .  Data  transfers  are  less 
complicated  with  the  64-bit  data  bus  since  there  are  fewer 
transfer  options  and  a  smaller  number  of  beats.  Also,  the 
time  from  one  cache  miss  to  the  next  is  independent  of  the 
data  bus  size.  Since  a  burst  transfer  on  the  32-bit  bus  takes 
more  cycles,  there  is  much  less  time  between  cache  misses  for 
the  PRC  to  do  its  job,  perhaps  too  little  time.  Further,  the 
32-bit  mode  is  specific  to  the  '603;  therefore,  the  PRC  would 
have  to  be  redesigned  to  be  used  with  the  other  64-bit  bus 
members  of  the  PowerPC  family.  A  disadvantage  of  the  64-bit 
option  is  the  increased  number  of  pins  required  for  the  PRC 
from  about  108  to  about  140. 

B.     SUMMARY  OF  '603  PROTOCOL 

The  PowerPC-603  has  separate  data  and  address  busses, 
each  with  independent  cycles,  referred  to  as  tenures  by  the 
Motorola  engineers.  Tenure  has  three  phases:  Arbitration, 
Transfer  and  Termination. 


The  system  has  a  bus  arbitration  unit  which  controls  the 
passing  of  bus  mastership  between  the  requesting  units.  In 
this  implementation,  the  CPU  and  the  PRC  are  the  only 
candidates  for  bus  mastership.  Module  Arbiter  is  the 
arbitration  unit. 

When  a  unit  wants  the  bus,  it  asserts  BR_  (bus  request)  . 
If  the  unit  can  have  the  bus  next,  the  arbiter  asserts  BG_ 
(bus  grant)  back  to  that  unit.  Then  the  unit  waits,  if 
necessary,  for  the  previous  master  to  finish  its  tenure,  after 
which  the  unit  takes  mastership  by  asserting  ABB_  (address  bus 
busy).  When  the  current  master  is  done  with  the  address  bus, 
it  negates  ABB_. 

This  system  has  no  external  cache  or  multiple  processors ; 
thus,  there  are  no  address-only  transactions.  If  a  unit  wants 
the  address  bus,  it  will  also  want  the  data  bus.  After 
granting  the  address  bus  by  asserting  BG_,  the  arbiter  then 
grants  the  data  bus  by  asserting  DBG_. 

Both  BG_  and  DBG_  remain  asserted  until  the  requesting 
unit  takes  mastership  or  withdraws  its  request  by  negating 
BR_.  If  there  are  no  pending  bus  requests,  the  arbiter  "parks" 
the  CPU  by  granting  it  the  busses.  If  the  CPU  is  parked,  it 
does  not  have  to  take  the  time  to  request  the  bus,  thereby 
reducing  the  time  for  the  memory  access.  If  the  CPU  is  parked 
and  the  PRC  requests  the  bus,  the  arbiter  unparks  the  CPU  and 
grants  the  bus  to  the  PRC. 


C .     TESTBENCH 

The  Testbench  is  the  highest  level  in  the  design 
hierarchy.  It  connects  the  CPU,  PRC,  memory,  and  arbitration 
unit.  This  module  establishes  the  system  clock  rate  and 
controls  the  simulation  time. 


CPU 


The  CPU  module  simulates  PowerPC-603  memory  accesses. 
The  Sequencer  is  a  sub-module  of  the  CPU  which  makes  the 
Testbench  able  to  simulate  every  transaction  relevant  to  the 
memory  and  PRC.  These  transactions  can  occur  in  any  order. 
Many  of  the  possible  '603  transactions  are  not  applicable  to 
this  particular  system  configuration.  For  example,  none  of 
the  "address  only"  transactions  are  relevant,  since  they  are 
for  systems  with  multiple  processors  or  second-level  caches. 
Bus  arbitration  is  accurately  modeled,  including  the  pipelined 
address  tenures. 

E .    MEMORY 

This  module  emulates  the  main  memory  of  the  system.  For 
simulation  efficiency,  the  memory  has  only  enough  physical 
address  space  for  four-beat  burst  reads:  128  bytes.  The 
address  bus  width  allows  a  virtual  address  space  of  four 
Gbytes.  Accesses  to  addresses  past  the  first  128  bytes  map  to 
addresses  within  the  first  128  bytes. 

The  time  required  for  memory  accesses  are  determined  by 
the  use  of  the  parameters  Delayl    and  Delay2 .   The  heading  in 


the  file  memory. v  describes  how  to  adjust  these  parameters  to 
achieve  a  realistic  memory  access  rate. 

There  were  two  significant  decisions  made  about  the  main 
memory  design.  First,  the  memory  emulates  a  60-ns  DRAM 
memory.  With  a  60-ns  memory  and  a  66-MHZ  system  clock,  the 
four-beat  burst  operation  takes  8-3-3-3  cycles,  that  is,  eight 
cycles  for  the  first  beat  and  three  more  cycles  for  each  of 
the  three  remaining  beats. 

The  second  design  decision  was  to  add  a  cancel  feature  to 
the  main  memory  chip.  The  memory  module  has  an  input  called 
CANX  which  cancels  the  current  read  operation.  It  is  through 
this  signal  that  the  PRC  stops  the  memory  module  from 
delivering  data  to  the  CPU  when  the  PRC  already  has  the  data. 

Another  option  would  be  to  put  the  PRC  between  the  CPU 
and  Memory,  not  allowing  a  read  request  to  get  to  the  memory 
chip  until  after  the  PRC  had  checked  its  contents.-  This 
scheme  would  increase  the  time  of  all  memory  accesses. 

F.    ARBITER 

The  Arbiter  emulates  the  external  bus  arbitration  unit, 
implemented  as  a  Finite  State  Machine  (FSM)  corresponding  to 
the  state  diagram  in  Figure  2 . 

The  memory  unit  in  this  Testbench  is  capable  of  handling 
up  to  two  memory  accesses  in  the  pipeline  at  a  time,  which  is 
the  maximum  that  the  CPU  will  ever  cause.  Adding  the  PRC  to 
the  system  creates  the  possibility  of  three  accesses  in  the 
pipe.  For  example,  the  PRC  could  initiate  a  third  address 
tenure  before  the  first  of  two  CPU  transactions  is  complete. 
This  potential  problem  is  handled  by  the  Arbiter  which  keeps 
track  of  the  pipelining  depth.   It  will  not  grant  the  address 
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bus  to  any  unit  if  that  address  tenure  would  put  a  third 
transaction  in  the  pipeline.   Rather,  the  Arbiter  will  stall 
until  the  data  tenure  from  the  first  transaction  is  complete, 
after  which  the  Arbiter  will  grant  the  address  bus  to  the 
requesting  unit. 


States 

I    A'1 

\  10 

A:    Start 

I  [1111] 

B:    Grant    CPU   addr   bus 
C:    Park   CPU 

ox 

11 

D:    Grant    CPU   data   bus 

E:    Grant    PRC    addr   bus 

F:    Wait    for    PRC 

G:    Grant    PRC   data   bus 

(     B-2 
I   [0111] 

(  E~5   ^ 

I    [1101]    J 

Inputs 

[CPU_BR_,     PRC_BR_] 

Outputs 
[CPU    BG    ,CPU    DBG    , 

PRC_BG_, PRC_DBG_] 

(       0X  \  \f 

V     n  >-^-\ 

/      /-^>Cxo 

^f     C-3 
1    [0011] 

(  F"6   ^\ 
\  t11o°]  / 

10 

XI 

ox  V    * 

llVJlL 

10 

(     D-4 
[     [1011] 

(  G"7    \ 

I    [1110]  j 

Numbers  refer  to  verilog  state  numbers. 

Figure  2.  State  diagram  for  Arbiter  FSM. 


G. 


TEST  RESULTS 


Testing  the  Testbench  itself  was  important  to  establish 
that  the  models  matched  the  behavior  described  in  the  Power  PC 
User's    Manual.        The  Testbench  passed  all  tests  of  reads, 
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writes  and  burst  operations,  in  various  sequences  of 
transactions  and  using  an  assortment  of  memory  access  delays. 
Figure  3  shows  the  fastest  possible  burst  operations,  as 
if  the  memory  access  time  were  not  the  limiting  factor.  Note 
again  that  the  address  tenure  of  the  second  transaction  can 
start  before  the  data  tenure  of  the  first  transaction  is 
complete . 


Baseline  1 
0    U 


12  Curso 
10S  120   135   150  165  180   195 


ch.  CPU1.  clk<>  Stl 

ch.CPUl.BR_ol 

ch.CPUl.BG_<>St0 

h.CPUl.ABB_<>Pul 
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h.CPUl.DBB   oPul 

PU1.D[0:63]<>   zzzzzzz 

ich.CPUl.TA_o  Stl 

ich.CPUl.clkoStl 


T-GDGD( 


OCDCD 


Figure  3.  Burst  write,  then  burst  read.   Delay=0.  [cWaves 
output] 


Figure  4  shows  a  burst  write  transaction  with  an  access 
delay  of  three  cycles  and  a  delay  of  one  cycle  in  between  each 
beat.   A  realistic  60-ns  DRAM  will  have  a  delay  of  8-3-3-3 
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rather  than  the  3-1-1-1  shown  here.   The  PRC  however  should  be 
able  to  supply  data  this  quickly. 
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Figure  4.  Burst  write,  burst  read.  Delay=3-1-1-1 .  [cWaves 
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III.   PRC  BEHAVIORAL  MODEL  DESIGN  PHASE 


This  chapter  presents  the  development  of  the  behavioral 
models  for  the  PRC.  A  simple  pseudocode  model  is  presented 
first.  This  model  was  used  to  develop  an  appropriate  data 
structure  and  block  diagram  for  the  PRC.  The  individual 
blocks  in  this  block  diagram  were  implemented  with  Verilog 
behavioral  modules  and  tested  together  to  verify  the 
behavioral  model  of  the  PRC.  The  next  step  was  to  convert 
each  module  into  a  hardware  model  compatible  with  Epoch, 
detailed  in  the  next  chapter. 

A.  PSEUDOCODE  MODEL 

The  behavior  of  the  PRC  is  explained  in  detail  in  the 
paper  by  Fouts  &  Billingsley  (1994,  p. 113)  and  summarized  in 
the  Introduction  chapter  of  this  thesis.  Another  way  of 
summarizing  this  behavior  is  through  a  pseudocode  model  as 
shown  in  Figure  5,  which  is  just  detailed  enough  to  identify 
the  most  significant  capabilities  the  PRC  must  have.  The 
purpose  of  taking  this  approach  was  to  clarify  the  function  of 
the  PRC  and  to  aid  in  identifying  specific  behaviors  of  this 
cache  which  the  hardware  needs  to  exhibit. 

B.  DATA  STRUCTURE 

A  possible  data  structure  for  the  PRC  is  shown  in  Figure 
6.    Each  of  the  128  lines  within  the  PRC  must  contain  two 
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addresses,  some  status  information  and  data.  The  two 
addresses  are  required  to  maintain  the  memory  access  pattern. 
There  are  also  two  seven-bit  pointers,  each  containing  a 
value  in  the  range  of  zero  to  127.  The  ActiveLine  pointer 
contains  the  number  of  the  line  that  is  currently  being  used 
by  the  PRC .  The  ReplaceLine  pointer  contains  the  number  of 
the  next  line  to  be  replaced  when  a  new  line  is  needed. 
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***  PRC  BEHAVIOR  MODEL  IN  PSEUDOCODE  *** 

//  CAR    =  current  address  register 
//  MRMA    =  most  recent  memory  address 
//  PredMA  =  predicted  memory  address 

always  at  negative  edge  of  HRESET_ 
clear  all  status  flags ; 
put  PRC  in  IDLE  state ; 
ActiveLine  =  0;  ReplaceLine  =  0 ; 

<IDLE> 

wait  for  next  transaction 

CASE  (transaction) 

data  burst-read: 

if  CAR  hits  in  PRC,   //PRC  has  requested  data 
switch  ActiveLine  to  line  that  was  hit; 
send  data  to  CPU; 
send  cancel  signal  to  memory; 
predict  next  address; 
if  next  address  is  not  already  in  PRC, 

read  next  address; 

store  in  ActiveLine; 

update  MRMA  and  PredMA; 

else  if  CAR  misses,  //PRC  does  not  have  requested  data 
switch  ActiveLine  to  the  next  ReplaceLine; 
if  this  is  the  first  miss  for  this  line, 

store  this  address  in  MRMA; 
if  this  is  the  second  miss  for  this  line, 
initiate  search  for  next  ReplaceLine; 
predict  next  address; 
if  next  address  not  already  in  PRC, 
read  next  address; 
store  in  ActiveLine; 
update  MRMA  and  PredMA; 

burst-write,  or  write: 
if  CAR  hits, 

flush  matching  line; 

data  read  or  instruction  transaction: 
ignore; 

endcase; 

goto  IDLE; 


Figure  5 .  PRC  Pseudocode  Model 
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DATA    STRUCTURE 


PredMA    (0:26)  MRMA    (0:26)  status 


DATA    (32   bytes) 


64  bit]      64  bits 

64  bits 

64  bits 

^ 

PredMA  =  Predicted  Memory  Address 
MRMA  ■  Most  Recent  Memory  Address 
V  -  Valid 
A  -  Aged 

Activeline 

ITigure  6.  PRC  Data  Structure. 
C.    BLOCK  DIAGRAM 

The  pseudocode  model  revealed  several  specific  tasks  the 
PRC  must  be  able  to  accomplish.  Identifying  and  clarifying 
these  tasks  resulted  in  the  development  of  six  blocks  within 
the  PRC.  These  blocks  are  shown  in  the  block  diagram  of 
Figure  7  and  are  described  briefly  here. 

The  Snooper  watches  transactions  between  the  CPU  and 
memory,  raising  appropriate  signals  if  the  transaction  is  one 
in  which  the  PRC  is  interested. 

The  Line  Manager  contains  the  Address  List  and  Line 
Replacement  Unit  as  sub-blocks.  The  Address  List  contains  all 
the  recently-accessed  memory  addresses  and  all  the  predicted 
addresses.  The  Line  Replacement  Unit  determines  which  of  the 
128  lines  will  be  replaced  the  next  time  a  new  line  is  needed. 
These  two  blocks  are  grouped  together  because  they  share 
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status  information  about  the  lines  and  work  closely  together 
for  line  management . 

The  Predictor  module  uses  its  two  input  addresses  to 
predict  its  output  address. 

The  Data  List  stores  128  lines  of  data,  32  bytes  in  each 
line,  which  is  the  amount  of  data  in  each  burst  read  or  burst 
write . 

The  Bus  Interface  handles  the  protocol  of  data  transfers 
in  to  and  out  of  the  PRC . 

Finally,  the  Controller  coordinates  the  actions  of  all 
the  other  functional  blocks  to  accomplish  the  mission  of  the 
PRC. 
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Figure  7.  PRC  Block  Diagram. 
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D .     CONTROLLER 

This  module  is  a  Finite  State  Machine  which  coordinates 
the  actions  of  all  the  other  functional  blocks  of  the  PRC . 
All  control  signals  are  synchronous  with  the  system  clock. 
HRESET_  causes  the  Controller  to  go  to  the  IDLE  state.  The 
state  diagram  and  state  output  tables  are  shown  in  Figures  8 
and  9  . 
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Figure  8.  Controller  State  Output  Table. 
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Controller   State   Diagram 


done 


Figure  9.  Controller  State  Diagram. 
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E .     SNOOPER 

This  module  watches  the  system  bus  activity  and  makes 
appropriate  reports  to  the  PRC  Controller. 

If  the  transaction  is  a  data  burst  read  or  any  kind  of 
write  and  if  the  address  parity  is  correct,  then  two  actions 
occur.  First,  read  or  write  is  asserted  as  appropriate. 
Second,  the  address  is  placed  in  the  Current  Address  Register 
(CAR)  .  The  snoop_ignore  signal  tells  this  unit  to  ignore  the 
current  transaction,  because  it  was  initiated  by  the  Bus 
Interface  Unit.  The  snoop_ignore  signal  must  be  asserted 
concurrently  with  the  transfer  attributes. 

Reads  that  are  not  burst  reads  or  data  related  are 
ignored  by  the  PRC.  The  CAR  is  updated  only  on  transactions 
relevant  to  the  PRC. 

Due  to  the  two-stage  pipelining  capability  of  the  PowerPC 
with  respect  to  memory  accesses,  a  second  address  tenure  can 
occur  shortly  after  the  first,  well  before  the  first  data 
tenure  is  complete.  To  compensate  for  this,  the  read  and 
write  outputs  of  the  Snooper  remain  exerted  until  acknowledged 
by  the  Controller  with  hold.  The  rising  edge  of  hold 

indicates  that  the  read  or  write  signal  was  received  by  the 
Controller.  The  Snooper  then  can  negate  these  signals  but 
must  leave  CAR  alone  until  hold  is  negated.  After  hold  is 
negated,  CAR  can  be  updated  to  the  new  address. 
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F.     LINE  MANAGER 

This  module  contains  the  address  list,  status  flags  for 

each  line  (Valid,  Aged) ,  a  general  status  flag  (line_empty)  , 

the   line   replacement   unit,   and   a   couple   of   pointers 

(ActiveLine,  ReplaceLine) .   On  HRESET_,    Valid=0  (all  lines) , 

Aged=0  (all  lines),  1  i  .n  e_  empty  =1 ,  ActiveLine  =  0  . 

The  MRMA  output  is  always  the  MRMA  of  the  ActiveLine. 
The  line_empty  flag  indicates  that  the  currently  active  line 
has  no  addresses  in  it  yet;  therefore,  the  addresses  cannot  be 
used  by  the  PRC  to  make  a  prediction. 

The  input  a_select  determines  which  address  input  is  used 
for  a  particular  operation.  The  two  address  inputs  are  the  CAR 
and  the  NAR. 

When  the  Line  Manager  receives  a  test  signal,  it  compares 
the  input  address  with  the  contents  of  the  PredMA  List.  If 
there  is  a  match  with  the  CAR,  it  asserts  the  hit  signal  and 
changes  the  ActiveLine  pointer  to  the  line  number  of  the  hit. 

If  there  is  a  miss  with  the  CAR,  then  the  ActiveLine 
switches  to  the  same  line  to  which  ReplaceLine  points. 

If,  during  a  test,  there  is  a  match  with  the  NAR,  two 
actions  occur.  First,  hit  is  asserted.  Second,  the  value  in 
ActiveLine  becomes  irrelevant  since  it  will  not  be  used.  If 
there  is  a  miss  with  the  NAR,  the  ActiveLine  must  remain 
unchanged  from  the  test . 

The  fetch_done  signal  from  the  Bus  Interface  causes  the 
NAR  to  be  stored  in  PredMA [ActiveLine] ,  the  CAR  to  be  stored 
in  MRMA  [ActiveLine]  ,  the  Valid  flag  to  be  set,  and  the  Aged 
flag  to  be  reset . 
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The  flush  signal  causes  the  current  ActiveLine  to  become 
invalid  by  setting  Valid [ActiveLine]  =  0. 

The  store  signal  causes  the  input  address  to  be  stored 
into  the  MRMA  of  the  ActiveLine.  This  is  only  used  for  the 
first  address  in  a  new  line.  The  store  signal  also  causes  the 
lme_empty   flag  to  be  reset. 

Line  replacement:  ReplaceLine  always  points  to  the  line 
to  be  replaced  at  the  next  PRC  miss.  HRESET_  causes  this  to 
be  zero. 

As  soon  as  the  PRC  starts  predicting  the  first  address 
for  a  line  it  asserts  new_replace .  The  replacement  unit  then 
finds  a  new  line  to  mark  as  the  next  ReplaceLine  according  the 
following  procedure. 

Done=false; 
repeat 

ReplaceLine  =  ReplaceLine  +  1;  (mod  128  addition) 
if  not (Valid[ReplaceLine] ) 

Done=true; 
elseif  (all_line_are_valid  AND  Aged [ReplaceLine] )  then 

Done  =  true; 
else 

Aged [ReplaceLine]  =  1; 
until  Done; 
line_empty=l; 

In  words,  the  Line  Replacement  Unit  searches  sequentially 
for  the  next  line  with  invalid  data  and  marks  that  line  as  the 
next  line  to  be  replaced.  If  all  lines  contain  valid  data, 
then  it  scans  for  the  next  line  that  is  "aged, "  indicated  by 
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a  set  Aged  flag.  As  it  scans  for  an  aged  line,  it  sets  the 
Aged  bits  in  the  "unaged"  lines  it  passes.  Therefore,  as  it 
wraps  around  in  the  search  for  an  aged  line,  it  will 
eventually  come  upon  one,  even  if  none  were  aged  when  the 
search  began. 

All  of  this  occurs  while  the  PRC  is  fetching  data. 
Therefore,  the  PRC  has  several  clock  periods  in  which  to 
complete  the  search. 

G .    PREDICTOR 

The  Predictor  module  has  two  address  inputs,  the  Most 
Recent  Memory  Address  (MRMA)  and  the  Current  Address  (stored 
in  the  Current  Address  Register,  CAR)  .  It  has  a  single 
output,  the  Next  Address  which  is  stored  in  the  Next  Address 
Register,  NAR. 

This  module  calculates  the  Next  Address  based  on  the  Most 
Recent  Memory  Address  and  the  Current  Address.  The  rising 
edge  of  predict  initiates  the  prediction  calculation.  The 
original  equation  is 

NAR  =  CAR  +  (CAR  -  MRMA) 

which  is  implemented  as 

NAR  =  2*CAR  -  MRMA. 

The  output  NAR  remains  latched  and  valid  until  next 
predict    leading  edge. 
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H.     DATA  LIST 

The  inputs  to  the  Data  List  are  upload,  download  and 
ActiveLine.   The  256-bit  bus  data_line  is  an  input  and  output. 

An  upload  signal  causes  the  Data  List  to  store  the  data 
on  data_line  into  the  address  specified  by  ActiveLine.  A 
download  signal  causes  the  Data  List  to  assert  onto  data_line 
the  data  in  the  address  specified  by  ActiveLine. 

I.     BUS  INTERFACE  UNIT 

This  module  handles  the  protocol  of  data  transfers  in  to 
and  out  of  the  PRC,  coordinating  these  activities  through  the 
use  of  a  Finite  State  Machine. 

When  this  module  receives  a  fetch  signal,  it  latches  the 
address  in  the  NAR  and  requests  the  bus  for  a  burst  read.  It 
stores  the  incoming  data  until  all  four  bursts  have  been 
received.  Then,  it  uploads  the  data  into  the  Data  List  and 
asserts  fetch_complete . 

When  this  module  receives  a  send  signal,  it  sends  a 
cancel  signal  (CANX)  to  the  memory  module,  downloads  data  from 
the  Data_List  and  then  sends  the  data  to  the  CPU.  When,  the 
transfer  is  finished,  it  asserts  send_done. 

J.     PREDICTION  TESTS 

There  are  two  large-scale  tests  included  in  this  thesis. 
The  first  is  the  Prediction  Test.  The  second  is  the  Line 
Replacement  Test.  Together,  these  tests  are  sufficient  to 
demonstrate  that  the  behavioral  model  functions  as  desired. 
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Once  the  behavioral  model  of  the  PRC  passed  these  tests,  it 
was  ready  for  conversion  to  a  hardware  model . 

The  tests  are  both  conducted  by  connecting  the  behavioral 
model  of  the  PRC  to  the  Testbench  described  in  the  previous 
chapter  and  running  a  simulation  with  a  sequence  of  events. 
The  sequence  of  events  for  the  Prediction  Test  is  included  in 
the  sequencer4 .v  file.  The  sequence  of  events  for  the  Line 
Replacement  Test  is  located  in  the  seguencer5.v  file.  The 
following  procedure  lists  the  steps  necessary  to  conduct  a 
test : 

1.  Change  directories  (cd)  to  the  ..  .ver  Hog /behavior/ 
directory. 

2 .  Modify  the  file  verilog_arguments  so  that  it  contains 
sequencer4 . v  or  seguencer5.v  as  desired  and  all  the 
parts  to  the  PRC  and  to  the  Testbench. 

3.  Modify  the  file  testbench. v  to  set  the  simulation 
duration  as  described  in  the  heading  of  the  desired 
sequencer.  Modify  the  trace  flags  in  every  file 
listed  in  verilog_arguments  as  described  in  the 
sequencer  file. 

4.  At  the  Unix  command  prompt,  enter  the  command  verilog 
-f  verilog_arguments . 

The  Verilog-XL  outputs  of  both  tests  are  included  in  the 
appendices.  Together,  these  tests  show  that  this  behavioral 
model  performs  all  the  desired  functions. 

The  Prediction  Test,  using  Sequencer4,  causes  a  series  of 
CPU  transactions  that  tests  the  ability  of  the  PRC  to  make  the 
prediction  calculation  and  to  fetch  the  data.  The 
transactions  are  as  follows: 

Burst_read  at  OOh:   The  PRC  stores  this  address. 
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burst_read  at  2 Oh:   The   PRC   should   predict   a   next 

address  of  40h  and  then  fetch  the 
data  from  that  address. 

burst_read  at  180h:  The  PRC  should  store  this  address  in 

a  new  line. 

burst_read  at  lAOh:  The   PRC   should   predict   a   next 

address  of  ICOh  and  then  fetch  the 
data  from  that  address. 

burst_read  at  40h:   This  data  is  already  in  the  PRC,  so 

the  PRC  should  send  it  to  the  CPU 
and  then  fetch  data  from  60h. 

burst_write,  ICOh:   This  data  is  in  the  PRC,  so  this 

line  should  be  flushed. 

burst_read  at  60h:   The  PRC  should  deliver  this  data  to 

the  CPU  and  then  fetch  the  data  at 
80h. 

burst_read  at  100h:  The  PRC  should  start  a  new  line  and 

store  this  address. 

This  test  successfully  demonstrates  a  majority  of  the 
capabilities  of  the  PRC,  showing  when  the  Line  Manager  selects 
new  lines,  when  and  how  the  Predictor  functions,  and  when  the 
CPU  starts  a  read  or  write  and  the  data  involved.  The  test 
shows  when  the  Bus  Interface  Unit  fetched  data  from  memory. 
The  Data  List  reported  the  flow  of  data  in  and  out  of  itself. 

The  only  significant  behavior  not  exercised  by  this  test 
is  the  function  of  the  Line  Replacement  Unit  when  the  PRC  is 
full.  That  is  handled  with  Sequencer5  in  the  Line  Replacement 
Test . 

The  Line  Replacement  Test  was  accomplished  by  a  series  of 
CPU  transactions  that  quickly  fill  the  PRC.   The  test  shows 
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that  the  Line  Replacement  Unit  correctly  selected  invalid 
lines  to  be  replaced  first.  When  all  the  lines  in  the  PRC 
contained  valid  data,  the  Line  Replacement  Unit  executed  the 
algorithm  described  in  the  section  on  the  Line  Replacement 

Unit  . 

K.     CONCLUSION 

At  this  point  in  the  development  of  the  PRC,  the 
behavioral  model  was  functioning  properly.  Therefore,  it 
could  be  converted  piece  by  piece  into  a  hardware  model.  This 
was  accomplished  using  the  subset  of  Verilog  understood  by 
Epoch,  as  described  in  the  next  chapter. 
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IV:  PRC  STRUCTURAL  MODEL  DESIGN  PHASE 


This  chapter  presents  the  development  of  the  hardware 
model  of  the  PRC.  In  this  phase  of  the  design  process,  each 
of  the  behavioral  blocks  developed  in  the  previous  phase  was 
implemented  with  hardware.  Converting  the  blocks  in  order  of 
increasing  complexity  proved  to  work  out  well,  making  it 
easier  to  concentrate  first  on  learning  how  to  use  Epoch. 

Like  the  behavioral  models,  the  hardware  (structural) 
models  are  Verilog  files.  Epoch  uses  these  Verilog  files  to 
create  VLSI  layouts.  From  those  layouts,  Epoch  calculates 
timing  information  and  generates  new  VerilogOut  files  with 
this  timing  information.  As  each  block  is  converted  into 
hardware,  the  new  VerilogOut  model  can  replace  the  original 
behavioral  model  in  the  Testbench  for  testing  with  Verilog-XL. 
The  following  hardware  blocks  result  from  using  this 
procedure . 

Each  section  of  this  chapter  also  includes  a  figure 
displaying  some  important  geometric  information  about  the 
module,  including  surface  area  and  transistor  count.  This 
information  can  be  obtained  from  Epoch  with  the  shell  command 
geostat    -trancount   <module  name> . 

A.    PRC 

The  top  level  module  is  only  a  connection  of  each  of  the 
modules  described  in  the  following  sections.  The  geostat 
information  is  shown  in  Figure  10.  Of  particular  significance 
are  the  transistor  count  and  the  total  chip  area. 
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Bounc 

ling  Box: 

9080.748  x  11278.224  microns,  102414707.226 

square  microns . 

357.510  x  444.025  mils,  158743.109 

square 

mi 

Is. 

Number  of  Pins  =  316. 

Number  of  unique  cells  =  6. 

Number  of  Datapaths  =  1 

Number  of  Sub-Glues  =  5 

Total 

Number 

of  Instances  =  6 

Total 

number 

of  nets  =  498. 

Total 

metall 

layer  route  length  = 

2120297 

.98 

microns . 

Total 

metal2 

layer  route  length  - 

699802. 

75 

microns . 

Total 

metal3 

layer  route  length  = 

0.00  microns. 

Total 

route  length  =  2820100.74  microns. 

Total 

number 

of  vias  =  2460. 

Total 

number 

of  segments  =  16989. 

Readi 

ng  transistor 

view  . . . 

Total 

number 

of  454310  transistors 

0.349 

Square 

mils  per  Transistor. 

2.862 

Transistors  per  square  mil. 

Power 

Dissipation 

=  4742486.500  micro-watts. 

Figure  10.   PRC  Geostat  Information.  [Epoch  output] 

B .    CONTROLLER 

This  module  is  a  Finite  State  Machine  which  coordinates 
the  actions  of  all  the  other  functional  blocks  of  the  PRC. 
All  control  signals  are  synchronous  with  the  system  clock. 
HRESET_  causes  the  Controller  to  go  to  the  IDLE  state.  The 
revised  state  output  table  (Figure  11)  and  the  revised  state 
diagram  (Figure  12)  give  more  details. 

Of  significance  are  the  wait  states  added  to  the  state 
diagram  of  the  behavioral  model .  These  changes  are  boldface 
in  the  Revised  Controller  State  Output  Table.  The  changes 
were  required  by  the  Line  Manager  in  which  there  is  a 
significant  propagation  delay  for  the  addresses.  This  delay 
is  described  in  more  detail  in  the  Line  Manager  section  of 
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this  chapter  and  is  a  prime  candidate  for  future  work  to 
improve  this  design  of  the  PRC .  The  geostat  information  is 
shown  in  Figure  13 . 


Controller   State 

Output 

Tabl 

e 

test 

store 

send 

new 

repl 

ace 

STATE                        a    selec 

t 

pr 

edict 

flush 

hold 

fetch 

IDLE 

CAR 

0 

0 

0 

0 

0 

0 

0 

0 

WAIT_A 

CAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_B 

CAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_C 

CAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_D 

CAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_E 

CAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_F 

CAR 

0 

0 

0 

0 

0 

1 

0 

0 

TEST_CAR(R) 

CAR 

1 

0 

0 

0 

0 

1 

0 

0 

SEND_DATA 

NAR 

0 

1 

0 

0 

1 

0 

0 

0 

TEST_NAR 

NAR 

1 

0 

0 

0 

0 

0 

0 

0 

FETCH_DATA 

NAR 

0 

0 

0 

0 

0 

0 

0 

1 

IS_LINE_EMPTY 

X 

0 

0 

0 

0 

0 

1 

0 

0 

PREDICT_NA 

NAR 

0 

1 

0 

0 

0 

1 

1 

0 

WAIT_G 

NAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_H 

NAR 

0 

0 

0 

0 

0 

1 

0 

0 

WAIT_I 

NAR 

0 

0 

0 

0 

0 

1 

0 

0 

STORE_CAR 

CAR 

0 

0 

1 

0 

0 

1 

0 

0 

TEST_CAR(W) 

CAR 

1 

0 

0 

0 

0 

1 

0 

0 

FLUSH_LINE 

X 

0 

0 

0 

1 

0 

1 

0 

0 

Figure  11. 


Revised  Controller  State  Output  Table 
highlighted. 


Changes 
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HRJESET 


SEND  DATA 


PREDICT  NA 


I  WAIT  G 


hit=l 
OR  read 
OR  write 


17 
WAITJH 

V ' 


18 
WATT  I 


V 


TEST  NAR 


fetoh_done 

OR 
fctoh  abort 


FETCH 


I  DATA 


Figure    12 .      Revised  Controller    State   Diagram 
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Bounding  Box: 

267.516  x  215.964  microns,  57773. 

825  square  microns. 

10.532  x  8.503  mils,  89.550  square  mils. 

Number  of  Pins  =  26. 

Number  of  unique  cells  =  18. 

Number  of  Standard  cells  -  60 

Total 

Number  of  Instances  -    60 

Total 

number  of  nets  =  71. 

Total 

metall  layer  route  length  = 

:  7073.14  microns. 

Total 

metal2  layer  route  length  = 

:  7073  .  46  microns . 

Total 

metal3  layer  route  length  = 

=  0.00  microns . 

Total 

route  length  =  14146.60  microns. 

Total 

number  of  vias  =  226. 

Total 

number  of  segments  =  1074 . 

Reading  transistor  view  . . . 

Total 

number  of  460  transistors. 

0.195 

Square  mils  per  Transistor. 

5.137 

Transistors  per  square  mil. 

Power  Dissipation  =  3665.888  micro-watts. 

Figure  13.   Controller  Geostat  Information.  [Epoch  output] 

C .     SNOOPER 

This  module  watches  the  system  bus  activity  and  makes 
appropriate  reports  to  the  PRC  Controller. 

If  the  transaction  is  a  data-burst  read  or  any  kind  of 
write  and  if  the  address  parity  is  correct,  then  the  read  or 
write  signal  is  asserted  as  appropriate.  Also,  the  address  is 
placed  in  the  CAR.  The  snoop_ignore  signal  tells  this  unit  to 
ignore  the  current  transaction,  because  it  was  initiated  by 
the  Bus  Interface  Unit.  The  snoop_ignore  signal  must  be 
asserted  concurrently  with  the  transfer  attributes.   Reads 


35 


that  are  not  burst  or  data  related  are  ignored  by  the  PRC . 
The  CAR  is  updated  only  on  transactions  relevant  to  the  PRC. 

Due  to  the  two-stage  pipelining  capability  of  the  PowerPC 
with  respect  to  memory  accesses,  a  second  address  tenure  can 
occur  shortly  after  the  first,  well  before  the  first  data 
tenure  is  complete.  To  compensate  for  this,  the  read  and 
write  outputs  of  the  Snooper  remain  asserted  until 
acknowledged  by  the  Controller  with  hold.  The  rising  edge  of 
hold  indicates  that  the  read  or  write  signal  was  received  by 
the  Controller.  The  Snooper  then  can  negate  these  signals, 
but  must  leave  CAR  alone  until  hold  is  negated.  After  hold  is 
negated,  CAR  can  be  updated  to  the  new  address. 

In  Stage  0,  the  transfer  attributes  are  latched  in 
registers.  Combinational  logic  determines  if  these  transfer 
attributes  represent  a  valid  read  or  a  valid  write  and  if  the 
address  parity  is  correct.  If  the  transaction  is  valid  and 
one  in  which  the  PRC  is  interested,  then  Stage  0  raises  a 
t rans act ion_wai ting   signal. 

A  Finite  State  Machine  in  Stage  One  sits  in  the  IDLE 
state  until  it  receives  the  transaction_waiting  signal.  Then 
it  latches  the  signals  needed  from  Stage  0,  resets  the 
transact ion_waiting  signal  and  then  waits  for  the  hold  signal 
to  go  low.  A  high  hold  signal  indicates  that  the  PRC  is  not 
done  with  the  previous  transaction.  Once  hold  goes  low,  the 
read  and  write  flags  are  set  according  to  the  type  of  the 
current  transaction.  Also,  the  input  address  is  stored  in  the 
Current  Address  Register.  The  FSM  then  waits  for  the  rising 
edge  of  hold  before  returning  to  the  IDLE  state  where  it  can 
check  if  there  is  another  transaction  waiting.  The  geostat 
information  is  shown  in  Figure  14. 
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Bounding  Box: 

607.500  x  409.536  microns,  248793.127  square  microns. 

23.917  x  16.123  mils,  385.630  square  mils. 

Number  of  Pins  =  88. 

Number  of  unique  cells  =  19. 

Number  of  Standard  cells  =  169 

Total 

Number  of  Instances  =  169 

Total 

number  of  nets  =  219. 

Total 

metall  layer  route  length  =  28547.10  microns. 

Total 

metal2  layer  route  length  =  14615.39  microns. 

Total 

metal3  layer  route  length  =  0.00  microns. 

Total 

route  length  =  43162.49  microns. 

Total 

number  of  vias  =  464. 

Total 

number  of  segments  =  2268. 

Reading  transistor  view  . . . 

Total 

number  of  3608  transistors. 

0.107 

Square  mils  per  Transistor. 

9.356 

Transistors  per  square  mil. 

Power  Dissipation  =  26722.156  micro-watts. 

Figure  14.   Snooper  Geostat  Information.  [Epoch  output] 

D.    LINE  MANAGER 

This  structural  model  uses  a  high  speed  RAM  {hsram)  for 
the  MRMA  List.  The  CAR  is  stored  into  this  RAM  on  a  store  or 
fetch_done   signal. 

The  predicted_ma_list  is  a  register  file  for  storing 
predicted  memory  addresses.  This  list  is  composed  of  128 
address  registers,  128  equality  comparators  and  128  Valid 
status  flags.  The  NAR  is  stored  in  this  list  at  the 
fetch_done  pulse.  If  there  is  a  match  with  the  input  address 
( in_addr) ,  a  priority  encoder  {ENC_C)  determines  which  line 
matches . 


37 


The  Line  Replacement  Unit  determines  the  next  line  to  be 
replaced  whenever  the  PRC  needs  to  start  a  new  line.  It  first 
selects  invalid  lines.  If  all  the  lines  are  valid,  then  it 
selects  lines  that  have  been  "aged."  A  priority  encoder 
{ENC_D  chooses  the  line  with  the  lowest  index  among  all  the 
lines  that  can  be  replaced.  If  all  lines  are  valid,  the 
output  enable  (oe)  signal  of  the  encoder  is  used  to  cause 
aging.  A  line  X  can  be  replaced  if  the  following  holds  true 
for  that  line: 

not  (X=ActiveLine)  AND  {not  Valid  [X]  OR  (all_lines_valid 
AND  Aged[X] ) } 

Aging  is  accomplished  by  the  use  of  a  seven-bit  counter 
(ager_counter) ,  initially  set  to  zero.  When  the  cause_aging 
signal  from  the  encoder  is  high,  the  counter  advances .  A 
decoder  (DEC_B)  output  causes  the  appropriate  Aged  flag  to  be 
set . 

Changing  values  of  the  CAR  or  NAR  have  a  propagation 
delay  of  25  ns  (1.8  cycles)  through  the  input  address 
multiplexer  (in_addr  mux) .  This  required  the  addition  of  wait 
states  in  the  Controller  before  each  of  the  tests.  The 
Revised  Controller  State  Output  Table  and  the  Revised 
Controller  State  Diagram  found  in  the  Controller  section  of 
this  chapter  show  the  required  changes.  The  geostat 
information  is  shown  in  Figure  15. 
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Bounding  Box: 

6704.064  x  8897.364  microns,  59648499.103  square  microns. 

263.940  x  350.290  mils,  92455.359 

square  mils. 

Number  of  Pins  =  505. 

Number  of  unique  cells  =  22. 

Number  of  Standard  cells  =  123 

Number  of  Blocks  =  1 

Number  of  Sub-Glues  =  2 

Total  Number  of  Instances  =  126 

Total  number  of  nets  =  3  57. 

Total  metall  layer  route  length  = 

1017746.50  microns. 

Total  metal2  layer  route  length  = 

463265.70  microns. 

Total  metal3  layer  route  length  = 

0 . 00  microns . 

Total  route  length  =  1481012.19  microns. 

Total  number  of  vias  =  2157. 

Total  number  of  segments  =  10524. 

Reading  transistor  view  . . . 

Total  number  of  207467  transistors. 

0.446  Square  mils  per  Transistor. 

2.244  Transistors  per  square  mil. 

Power  Dissipation  =  1777694.500  micro-watts. 

Figure  15.   Line  Manager  Geostat  Information.  [Epoch  output 


E .    PREDICTOR 

The  purpose  of  this  module  is  to  calculate  the  Next 
Address  (stored  in  NAR)  based  on  the  Most  Recent  Memory  Access 
(MRMA)  and  the  Current  Address  (in  the  CAR) .   The  prediction 
calculation  is 

NAR  =  2*CAR  -  MRMA 

In  this  structural  implementation  of  the  Predictor,  the 
predict  signal  is  the  latch  for  the  CAR  and  MRMA  registers. 
The  subtraction  is  accomplished  as  a  two's  compliment  addition 
with  a  high  speed  adder. 
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The  CAR  is  multiplied  by  two,  an  arithmetic  shift  left  of 
one  bit.  The  most  significant  bit  of  the  CAR  is  not  retained, 
as  it  will  not  have  an  effect  on  the  27-bit  output  of  the 
adder.  This  will  adversely  affect  address  prediction  only 
around  the  midpoint  of  the  four  gigabytes  of  memory.  The 
applicable  Golden  Rule  of  computer  design  "is  to  make  the 
common  case  fast:  In  making  a  design  tradeoff,  favor  the 
frequent  case  over  the  infrequent  case."  (Hennessy,  1990) 

A  number  is  negated  in  two's  compliment  by  inverting  all 
the  bits  and  adding  '1' .  The  MRMA  is  negated  by  inverting  all 
its  bits.  Adding  the  required  '!'  is  implemented  as  a 
Carry-In  to  the  adder. 

The  Epoch  TACTIC  tool  reported  the  propagation  delay  from 
predict  to  NAR  to  be  4.90  ns .  The  geostat  information  is 
shown  in  Figure  16. 
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Bounding  Box: 

261.900  x  895.824  microns, 

234616 

293  s 

quai 

re  microns . 

10.311  x  35.269  mils,  363 . 

656  square  mi 

Is. 

Number  of  Pins  =  113 . 

Number  of  unique  cells  =  10. 

Number  of  Blocks  =  107 

Total 

Number 

of  Instances 
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=  27722.887  micro-watts. 

Figure  16.   Predictor  Geostat  Information.  [Epoch  output] 

F.     DATA  LIST 

This  module  stores  the  data  retrieved  from  memory  in 
anticipation  of  a  request  by  the  CPU.  The  basic  memory  cell 
is  the  Epoch  part  hsramoe  (high  speed  ram  with  output  enable) . 
Since  each  hsram  has  a  maximum  word  size  of  128  bits,  there 
are  two  hsram  parts  in  parallel  to  get  the  required  256-bit 
width . 

An  upload  signal  causes  the  Data  List  to  store  the  data 
on  data__line  into  the  address  specified  by  ActiveLine .  The 
input  upload  has  to  be  inverted  to  match  the  active-low  WR 
input  of  the  Epoch  hsram   component .   A  download   signal  causes 
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the  Data  List  to  assert  onto  data_line  the  data  in  the  address 
specified  by  ActiveLine.  This  signal  also  has  to  be  inverted 
for  the  same  reason. 

Both  the  invertors  can  probably  be  removed  if  the  Bus 
Interface  Unit  makes  the  upload  and  download  signals  active 
low.  That  could  only  improve  the  response  time  of  the  data 
memory . 

Epoch  calculated  the  following  timing  delays: 

download  ->  hsramoe.DOUT   2.3  ns 
ActiveLine  ->  hsramoe.DOUT  7.3  ns 

A  design  alternative  is  to  use  the  regular  speed  version, 
ramoe,  which  gives  the  following  timing  delays: 

download  ->  ramoe. DOUT   4  ns 
ActiveLine  ->  ramoe. DOUT  16  ns 

Using  this  slower  RAM  is  possible,  but  would  require  a 
significant  modification  to  the  PRC  behavior  to  handle  the 
longer  delay  and  would  add  a  cycle  delay  to  CPU  reads  when 
there  is  a  hit  in  the  PRC. 

Putting  the  VerilogOut  file  of  this  module  into  the 
original  PRC  behavioral  model  for  mixed-mode  simulation  caused 
a  timing  error  that  had  to  be  corrected  in  the  Bus  Interface 
Unit  behavioral  model.  After  an  upload  to  the  Data  List, 
data_line  must  remain  valid  long  enough  to  meet  the  data  hold 
time  requirement  of  the  Epoch  part  hsramoe .  The  geostat 
information  is  shown  in  Figure  17. 
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Bounding  Box: 

3834.792  x  3222.936  microns,  12359289.299  square  microns 
150.976  x  126.887  mils,  19156.938  square  mils. 

Number  of  Pins  =  282. 
Number  of  unique  cells  -    3. 
Number  of  Standard  cells  =  2 
Number  of  Blocks  =  2 
Total  Number  of  Instances  =  4 

Total  number  of  nets  =  269. 

Total  metall  layer  route  length  =  198805.54  microns. 

Total  metal2  layer  route  length  =  52952.76  microns. 

Total  metal3  layer  route  length  =  0.00  microns. 

Total  route  length  =  251758.30  microns. 

Total  number  of  vias  =  728. 

Total  number  of  segments  =  2422. 
Reading  transistor  view  . . . 

Total  number  of  214712  transistors. 

0.089  Square  mils  per  Transistor. 

11.208  Transistors  per  square  mil. 
Power  Dissipation  =  2181481.250  micro-watts. 


Figure  17.   Data  List  Geostat  Information.  [Epoch  output] 


G.    BUS  INTERFACE 

This  module  connects  the  PRC  with  the  system  bus.  It 
handles  the  protocol  of  data  transfer  in  and  out  of  the  PRC. 

When  this  module  receives  a  fetch  signal,  it  latches  the 
address  in  the  NAR  and  requests  the  bus  for  a  burst  read.  It 
stores  the  incoming  data  until  all  four  bursts  have  been 
received.  Then  it  uploads  the  data  into  the  Data  List  and 
asserts  fetch_done .  If  there  is  a  parity  error  during  the 
fetch,  the  Bus  Interface  informs  the  Controller  by  asserting 
fetch_abort .      Also,  the  transaction  is  canceled. 
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When  this  module  receives  a  send  signal,  it  sends  a 
cancel  signal  {CANX)  to  the  memo ry  module,  downloads  data  from 
the  Data  List  and  then  sends  the  data  to  the  CPU.  When  the 
transfer  is  finished,  it  asserts  send_done. 

The  coordination  of  these  activities  is  accomplished 
through  the  use  of  two  Finite  State  Machines.  One  acts  as  an 
address  bus  master.  The  other  controls  the  flow  of  data.  The 
geostat  information  is  shown  in  Figure  18. 


Bounding  Box:  -6264,  -6408,  2246040,  1972980. 

2252.304  x  1979.388  microns,  4458183.285  square  microns. 

88.673  x  77.929  mils,  6910.198  square  mils. 

Number  of  Pins  =  448. 

Number  of  unique  cells  =  56. 

Number  of  Standard  cells  =  13  93 
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route  length  =  1145559.87  microns. 
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number  of  vias  =  9679. 

Total 

number  of  segments  =  44298. 

Reading  transistor  view  . . . 

Total 

number  of  24403  transistors 

0.283 

Square  mils  per  Transistor. 

3.531 

Transistors  per  square  mil. 

Power  Dissipation  =  237269.750  micro-watts. 

Figure  18.   Bus  Interface  Geostat  Information.  [Epoch 
output ] 


H.     TESTING 

The  most  significant  large-scale  test  of  the  structural 
model  is  the  Prediction  Test,  which  is  similar  to  the 
Prediction  Test  of  the  behavioral  model.   The  test  runs  the 
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same  series  of  CPU  transactions  to  exercise  all  functional 
blocks  of  the  PRC .  The  sequence  of  events  for  the  Prediction 
Test  is  included  in  the  seguencer4 . v  file. 

The  following  steps  are  required  to  conduct  a  test: 

1.  Change  directories  (cd)  to  the  . . .veri log /hardware/ 
directory  on  the  Computer  Center  (CC)  system. 

2.  At  the  Unix  command  prompt,  enter  the  command 
veri log    -f  verilog_arguments . 


The  Verilog-XL  output  of  the  test  is  included  in  the 
appendices.  This  test  shows  that  the  structural  model  of  the 
PRC  performs  the  desired  functions.  The  output  of  the 
structural  model  test  is  different  from  the  output  of  the 
behavioral  model  test  mainly  because  the  new  structural  model 
does  not  contain  the  same  display  commands.  These  commands 
interfere  with  the  Epoch  compilation  of  the  modules.  Other 
display  commands  were  added  to  the  Testbench,  which  is  still 
a  behavioral  model.  The  displays  are  sufficient  to  show  that 
PRC  performs  as  desired. 

While  compiling  the  source  files,  Verilog-XL  reports  four 
warnings  about  implicit  wires  having  no  fanin.  These  wires 
are  labeled  NCO  and  NCI,  deriving  their  initials  from  "not- 
connected."  They  are  unused  outputs  on  a  couple  of  Epoch 
parts.   Therefore,  these  warnings  can  be  ignored. 

The  section  with  comments  about  SDF  Annotation  is  the 
result  of  incorporating  the  Epoch  timing  analysis  into  the 
Verilog  model.  Once  that  annotation  is  complete,  the  actual 
simulation  begins. 

The  error  messages  at  the  beginning  of  the  simulation  can 
be  ignored.   These  error  messages  are  generated  by  Epoch  parts 
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and  indicate  improper  signal  values  or  timing.  All  these 
errors  occur  before  the  system  hard  reset  and  are  expected. 
Having  those  errors  after  the  system  hard  reset  would  have 
indicated  a  real  problem. 

Once  the  system  has  reset,  the  CPU  starts  its  series  of 
transactions,  beginning  with  reads  from  addresses  OOh  and  2 Oh. 
The  comment  "PRC  requested  the  bus"  indicates  that  the  PRC  is 
prefetching  data.  It  appears  that  the  prefetch  occurs  before 
the  start  of  the  second  CPU  transaction,  but  in  reality  it 
occurs  just  after  the  second  CPU  address  tenure,  which  is  not 
shown  in  the  report.  Also  not  shown  because  of  the  limitation 
of  display  commands  with  the  PRC  is  the  data  prefetched  by  the 
PRC.  That  the  data  is  correct  can  be  seen  .later  in  the 
report,  when  the  PRC  sends  the  data  to  the  CPU. 

During  the  CPU  to  Memory  transactions,  there  is  60  ns 
between  each  of  the  four  beats  of  data.  When  the  CPU  reads 
from  address  40h,  the  speed  advantage  of  the  PRC  is 
demonstrated.  Note  that  there  is  now  only  15  ns  between  each 
beat.  That  is  the  period  of  the  system  clock  and  is  therefore 
the  maximum  possible  rate  the  CPU  can  receive  data. 

The  write  to  address  ICOh  occurred  after  the  PRC  had 
prefetched  that  data.  The  PRC  should  have  flushed  the 
prefetched  data,  because  it  was  no  longer  valid.  Later,  when 
the  CPU  performs  a  read  from  the  same  address,  it  can  be  seen 
from  the  read  data  and  from  the  timing  (60  ns  per  beat)  that 
the  CPU  is  getting  the  data  from  main  memory.  In  accordance 
with  its  design,  the  PRC  did  not  try  to  give  the  stale  data  to 
the  CPU. 
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V.   CAD  TOOLS 


The  three  primary  design  tools  used  in  the  development  of 
this  PRC  were  Verilog-XL,  cWaves  and  Epoch.  This  chapter 
describes  some  of  the  particularly  useful  features  of  these 
tools  and  gives  some  tips  for  using  these  tools  together. 

A.    VERILOG-XL 

Verilog-XL  allows  the  modeling  of  circuits  in  a 
programming  language.  Circuits  can  be  modeled  by  behavior  or 
structure.  For  the  complex  design  of  the  PRC,  it  was 
convenient  to  start  by  dividing  the  design  into  six  blocks  and 
then  using  Verilog  to  model  the  behavior  of  each  block.  This 
allowed  clarification  of  the  required  behaviors,  deferring  the 
search  for  hardware  solutions  until  after  the  desired 
behaviors  were  well  defined. 

Currently,  Verilog-XL  is  available  only  on  the  Computer 
Center  (CC)  network.  The  following  steps  make  it  easier  to 
use  from  an  Electrical  and  Computer  Engineering  (ECE) 
workstation : 

1.  Add  the  following  line  to  the  .cshrc   file  in  the  ECE 
account:  alias   rcc    'xhost    in50204 .  cc .nps .navy .mil ; 
rlogin    -1    <username>   in50204.cc.nps.navy.mil'. 

2.  Re-source  the  session  by  typing  "sc  <return>" . 

3.  Type  "rcc  <return>"  to  log  into  the  CC  account. 

4.  Add  the  following  line  to  the  .cshrc   file  in  the  CC 
account:  alias    remote3  ' setenv  DISPLAY 
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sun3 .ece .nps .navy .mil : 0 . 0 '      The  .cshrc   file  can 
contain  similar  lines  for  other  workstations. 

5.  Re-source  as  in  Step  2. 

Now  the  ECE  workstation  becomes  the  display  for  the  CC 
workstation.  Typing  "fiiemgr  &"  will  call  up  the  CC  file 
manager . 

Typing  "verilog  <return>"  should  give  a  list  of  options 
for  use  with  Verilog-XL  and  will  verify  access  to  the  program. 
One  particularly  useful  option  is  to  put  all  the  arguments  in 
a  file,  such  as  verilog_arguments  and  put  the  following  line 
in  the  CC  .cshrc   file: 

alias  veri  'verilog  -f  verilog_arguments ' 

Typing  "veri"  is  much  easier  than  listing  the  names  of  all  the 
files  that  need  to  be  included  in  the  simulation. 

The  Cadence  online  documentation  can  be  accessed  with  the 
command  "openbook  &" .  The  Main  Menu  is  the  starting  point. 
The  Alphabetical  List  on  the  bottom  is  the  easiest  way  to  find 
the  desired  information.  In  this  list  there  is  a  Verilog-XL 
section  which  contains  hyperlinks  to  the  Verilog-XL  Reference 
Manual  and  Tutorial . 

B .     C WAVES 

This  tool  is  indispensable  for  the  analysis  of 
complicated  circuits.  There  is  nothing  like  seeing  a  timing 
diagram  to  track  down  design  errors . 

The  database  for  the  cWaves  Viewer  is  created  while 
running  the  Verilog  simulation.   The  highest  level  Verilog 
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module  should  have  the  following  two  lines  in  an  "initial" 

block : 

$shm_open; 

$shm_probe (<name>, "AS") ; 

where  <name>  is  the  instance  name  of  the  module  to  be 
observed.  More  information  about  these  $shm  commands  can  be 
found  in  the  cWaves  Reference  Manual,  which  is  a  little 
difficult  to  find.  It  is  in  the  Cadence  Online  Library 
accessed  with  "openbook  &  <return>"  .  Once  the  Main  Menu 
appears,  select  the  Alphabetical  List  on  the  bottom.  The 
cWaves  Reference  Manual  is  filed  under  Composer  (Schematic 
Entry),  Design  Framework  II.  Section  4  of  this  manual  is 
particularly  useful. 

C .    EPOCH 

A  circuit  designer  would  find  it  very  convenient  if  Epoch 
would  take  as  input  the  raw  behavioral  models,  but  it  does 
not .  Each  behavioral  block  must  be  converted  into  a 
structural  model.  Then,  Epoch  can  automatically  generate  a 
Very  Large  Scale  Integrated  (VLSI)  circuit  layout  using  a  rule 
set  from  a  specific  manufacturer.  From  the  layout,  Epoch 
performs  a  timing  analysis  of  the  circuit  and  generates  a  new 
Verilog  file,  which  includes  the  timing  information.  This  new 
file  then  can  replace  the  behavioral  model  for  resimulation 
with  Verilog-XL.  This  allows  the  designer  to  verify  each 
block  as  it  is  designed.  CWaves  can  be  used  to  track  down 
timing  errors. 
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Epoch  is  available  on  the  ECE  system.  To  access  Epoch, 
add  " /tools3 /epoch/bin"  to  the  "set  path"  command  in  the 
.cshrc   file.   Also,  add  "setenv  CASCADE  /tools3 /epoch" . 

The  Epoch  User's  Tutorial  and  the  Epoch  Verilog  Interface 
Reference  are  both  very  useful .  The  former  is  located  at 
/tools3 /epoch/ data/ examples /tutorial .  The  latter  can  be 
accessed  through  pull-down  menus  in  Epoch: 

Help  =>  On-Line  Manual . . . 

Sometimes  calling  up  this  manual  causes  a  FrameViewer  error, 
but  the  manual  does  come  up  after  a  slight  delay. 

The  VerilogOut  option  proved  very  useful  in  the 
development  of  the  PRC.  With  this  option,  Epoch  creates  a  new 
Verilog  file  after  laying  out  a  design.  The  new  model  can  be 
inserted  in  place  of  the  old  behavioral  model  for  simulation 
with  Verilog-XL.  The  Verilog  Interface  reference  describes 
how  this  is  done.  In  addition  to  the  procedures  described 
there,  it  will  be  necessary  to  take  a  few  extra  steps. 

1.  If  the  files  must  be  moved  from  the  vout  directory  to 
another  directory  for  simulation  with  Verilog-XL, 
correct  the  $sdf_annotate  path  in  the  .v  file. 

2.  In  all  the  behavioral  files,  add  a  'timescale 
directive  like  the  one  in  the  .v  file  generated  by 
Epoch.  This  must  appear  before  the  "module" 
statement . 

3.  It  may  be  necessary  to  copy  primelib  .v  from 
/tools3/ epoch/ data /verilog   into  the  CC  directory. 

The  PowerPC  uses  bit  zero  as  the  most  significant  bit  of 
buses,  so  it  was  convenient  to  follow  that  convention  in  this 
PRC   design.     For  example,   the   PowerPC   address  bus   is 
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designated  A[0:31] .  Unfortunately,  this  causes  a  problem  with 
the  VerilogOut  program,  which  reorders  some  of  the  indices  and 
connects  busses  in  reverse  order.  This  problem  seems  to  be 
unique  to  the  VerilogOut  file  generation.  The  physical  layout 
itself  gets  connected  correctly  regardless  of  the  index 
numbering  convention.  Resolving  this  problem  required 
renumbering  the  indices  of  all  modules  used  for  Epoch  input  so 
that  the  most  significant  bit  had  the  highest  index,  such  as 
A[31:0]  . 
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VI:  CONCLUSIONS  AND  RECOMMENDATIONS 


A.    CONCLUSIONS 

In  conclusion,  the  objective  of  this  research  has  been 
met.  This  thesis  presents  a  complete  hardware  design  for  the 
PRC .  The  simulation  results  show  that  the  PRC  can  deliver 
data  to  the  CPU  at  the  rate  of  8-1-1-1,  that  is  eight  cycles 
for  the  first  beat  and  one  cycle  for  each  of  the  remaining 
three  beats.  This  performance  is  better  than  the  performance 
of  main  memory  (8-3-3-3)  .  With  a  little  more  work  on  the 
design,  the  PRC  should  be  able  to  deliver  data  at  a  rate  of  4- 
1-1-1. 

Aguilar  proposed  a  design  consisting  of  six  modules  which 
together  would  comprise  the  PRC.    He  took  a  bottom-up 
approach,  designing  four  of  those  six  modules,  testing  each 
independently,  but  not  together.  (Aguilar,  1995)   As  a  result, 
the  designs  of  these  modules  require  modifications  to  enable 
them  to  function  correctly  together.   Rather  than  redesigning 
the  four  modules,  the  approach  taken  during  this  research  was 
top-down.   That  is,  a  single  working  behavioral  model-  was 
divided  into  six  behavioral  models  that  functioned  together, 
and  then  each  of  the  six  behavioral  models  was  converted  into 
a  hardware  model.   The  result  is  still  a  six-module  design, 
but  the  six  modules  of  this  design  have  different  functions 
than  the  six  modules  of  the  design  by  Aguilar.   The  top-down 
approach  worked  exceedingly  well  to  clarify  the  design  and  to 
minimize  inter-module  signal  problems. 
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This  research  required  a  total  of  three  academic 
quarters.  The  work  during  the  first  quarter  primarily 
involved  studying  the  problem,  analyzing  the  design 
requirements,  and  learning  about  the  PowerPC  system.  Two  more 
quarters  were  required  for  the  creation  of  the  design,  one 
quarter  each  for  the  behavioral  design  phase  and  the 
structural  design  phase. 

Epoch  and  Verilog-XL  proved  reliable  and  highly  useful 
during  the  development  of  this  hardware  design.  Verilog-XL 
performed  the  simulations  necessary  to  verify  the  design. 
Epoch  performed  the  VLSI  circuit  layout  and  timing  analysis 
that  were  required  by  Verilog-XL  in  order  to  produce 
simulation  results  that  could  be  considered  accurate. 

Simulations  with  Verilog-XL  are  conveniently  short  while 
testing  small  modules.  However,  simulations  of  the  entire  PRC 
design  typically  ran  for  half  an  hour  on  a  SUN  SPARC-10  work 
station.  Similarly,  on  small  designs  Epoch  runs  fast  enough 
that  a  user  could  wait  at  the  work  station.  To  compile 
complex  modules  Epoch  requires  much  more  time.  For  example, 
Epoch  takes  over  an  hour  to  compile  the  Bus  Interface  of  the 
PRC  and  more  than  three  hours  to  compile  the  entire  PRC. 

Both  Verilog-XL  and  Epoch  have  functions  and  options 
which  are  not  readily  apparent.  That  problem  is  compounded  by 
inadequate  indexes  in  the  user's  manuals  for  each  of  these 
tools.  On  the  other  hand,  the  tutorials  are  very  helpful  for 
revealing  some  of  those  functions  and  options. 

Some  of  the  options  in  Epoch  require  significant  studying 
before  use.  The  pull-down  menus  in  Epoch  could  be  better 
organized.  Both  of  these  characteristics  work  to  make  Epoch 
less  user-friendly  than  it  should  be. 
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B .    RECOMMENDATIONS 

As  with  any  complex  design,  there  is  much  more  that  a 
designer  could  do  to  improve  this  PRC .  This  section  describes 
some  areas  of  potential  future  research  related  to  this 
hardware  design. 

The  first  recommendation  is  to  consider  including  the 
Arbiter  on  the  PRC  chip.  This  PRC  design  was  developed  for  a 
PowerPC-603  microprocessor  system,  in  which  both  the  PRC  and 
the  CPU  are  candidates  for  bus  mastership.  This  requires  that 
there  be  a  bus  arbitration  unit  co  prevent  both  devices  from 
trying  to  use  the  bus  simultaneously.  The  bus  arbitration 
unit  is  a  simple  device  whose  function  can  be  fulfilled  with 
a  single  finite  state  machine  (FSM) .  It  would  be  very  easy  to 
add  this  FSM  to  the  PRC  chip,  eliminating  the  requirement  to 
fabricate  a  separate  integrated  circuit  chip. 

The  second  recommendation  is  in  regards  to  improving  the 
Line  Manager  design.  The  Line  Manager  is  the  block  that 
requires  the  wait  stau.es  in  the  Controller  State  Diagram.  The 
impact  of  these  wait  states  is  a  delay  of  three  cycles  in 
determining  if  there  is  a  hit  within  the  PRC.  Finding  a  way 
of  eliminating  these  wait  states  could  improve  the  speed  at 
which  the  PRC  delivers  the  first  beat  of  data  to  the  CPU  and 
the  speed  at  which  the  PRC  prefetches  data  from  main  memory. 
Specifically,  the  performance  would  improve  from  8-1-1-1  to  5- 
1-1-1.  There  is  a  strong  chance  that  Epoch  would  prove  useful 
in  this  endeavor.  Epoch  has  timing  analysis  routines  and  can 
perform  layouts  in  such  a  way  as  to  minimize  propagation 
delays  for  critical  signals.  Epoch  also  has  automatic  buffer 
sizing  algorithms  which  could  be  used  to  ensure  the  output 
signals  of  each  part  are  buffered  sufficiently  to  drive  their 
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loads.  These  capabilities  of  Epoch  do  require  considerable 
CPU  time.  For  example,  running  an  automatic  compilation  on 
the  current  design  of  the  Bus  Interface  Unit  takes  over  an 
hour  of  actual  CPU  time  on  a  Sun  SPARC  10  workstation  if  the 
buffer  sizing  option  is  selected. 

The  next  recommendation  is  to  study  the  rest  of  the 
design  for  critical  paths.  With  Epoch  as  an  analysis  tool,  it 
should  be  uncomplicated  to  analyze  the  entire  PRC  for  critical 
timing  paths.  Some  timing  limitations  may  be  improved  through 
the  buffer-sizing  and  timing-critical  layout  capabilities  of 
Epoch.  Other  timing  limitations  may  require  modifying  the 
design.  The  current  PRC  design  includes  only  parts  that  were 
available  in  the  Epoch  library.  It  may  be  possible  to  design 
parts  that  outperform  the  Epoch  parts. 

The  final  recommendation  regards  fabrication.  If  the  PRC 
design  detailed  in  this  thesis  is  to  be  fabricated,  it  must 
undergo  two  steps.  First,  the  power  rails  should  be  studied 
using  Epoch  to  determine  if  there  is  a  requirement  for 
additional  power  and  ground  rails.  Second,  the  design  must  be 
put  inside  a  pad  ring.  Epoch  may  be  able  to  create  the  pad 
ring  automatically  with  minimal  intervention  by  the  designer. 
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APPENDIX  A.  LAYOUTS 


This  appendix  contains  the  VLSI  (Very-Large-Scale- 
Integrated)  circuit  layouts  for  the  PRC .  These  layouts  were 
all  generated  by  Epoch. 
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LM1  (linejmgr) 


Figure  Al . 


The  PRC  expanded  to  the  first  level.   The  four 
blocks  in  the  lower  left  corner,  in  order  of 
decreasing  size,  are  the  Bus  Interface, 
Predictor,  Snooper,  and  Controller.  [Epoch 
output] 
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Figure  A2 .   The  PRC  fully  expanded.  [Epoch  output] 
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Figure  A3.   The  Controller.  [3poch  output 
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Figure  A4 .   The  Snooper.  [Epoch  output 
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Figure  A5 . 


The  Line  Manager  expanded  one  level.   The  bottom 
portion  is  shown  in  more  detail  in  the  next 
figure.  [Epoch  output] 
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Figure  A6 . 


The  Line  Manager.   Detail  of  the  bottom  portion 
in  the  previous  figure.  [Epoch  output] 
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Figure  A7 .   The  Line  Manager  fully  expanded.  [Epoch  output] 
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Figure  A8 .   The  Predictor  fully  expanded.  [Epoch  output] 
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Figure  A9 .   The  Data  List  fully  expanded.  [Epoch  output 
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Figure  A10.  The  Bus  Interface  fully  expanded .  [Epoch  output] 
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Figure 


The  Line  Replacement  Unit  fully  expanded 
output ] 


Epoch 
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Figure  A12 


The  Predicted  Memory  Address  List  fully 
expanded.  [Epoch  output] 
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Figure  A13 .  The  128-to-/  Priority  Encoder  fully  expanded 
[Epoch  output] 
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igure  A14 .  The  Predicted  Address  Register  fully  expanded 
[Epc  ^h  output ] 
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APPENDIX  B.  TESTBENCH  VERILOG  FILES 


This  appendix  contains  the  Verilog  files  for  the 
Testbench.  They  are  all  behavioral  models,  used  together  to 
test  the  PRC  design.  The  file  are  located  on  the  Computer 
Center  system  at  joshua_u2/jrrobert/thesis/verilog/behavior. 

A .  TESTBENCH 


*  TESTBENCH 

*  Filename:  testbench.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     24AUG95 

*  Revised:  10JAN96 

*  Purpose:  This  module  is  the  highest  level  in  the  design  hierarchy.  It 

*  emulates  a  complete  computer  system,  composed  of 

*  1.  cpu:  a  PowerPC-603  microprocessor. 

*  2.  ram:  random  access  memory. 

*  3.  arbiter:  the  bus  arbitration  unit. 

*  4.  pre:  the  predictive  read  cache  under  design. 

* 

*  System  configuration  and  features: 

*  Single  CPU 

*  64-bit  data  bus 

*  No  out-of-order  split-bus  transactions. 

*  Synchronous  interface:  all  I/O  sampled  on  rising  edge  of  bus  clock. 

*  66  MHZ  system  clock,  66  MHZ  CPU  clock. 

*  Simulation  should  be  done  with  a  time  unit  =  1  ns. 

module  testbench; 

//  Signal  Declarations  -  conforms  to  PowerPC-603  notation 

//  Address  Arbitration 

wire  CPU_BR_,  //Bus  Request 

CPU_BG_;  //Bus  Grant 

tril  ABB_;  //Address  Bus  Busy 
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tril  TS_;  //Transfer  Start  (memory  only,  not  I/O) 

//  Address  bus 

wire  [0:31]  A;  //Address  (note  Motorola's  reverse  notation) 

wire  [0:3]  AP;  //Address  Parity 

wire  APE_;  //Address  Parity  Error 

//  Transfer  attributes 

wire  [0:4]  TT;  //Transfer  Type 

wire  [0:2]  TSIZ;  //Transfer  Size 

w  ire  [0: 1  ]  TC;  //Transfer  Code 

tril  TBST_;       //Transfer  burst 

wire  GBL_, 

CI_. 

WT_, 

CSE; 

//  Address  Termination 

tri  1  A ACK_;      //Address  Acknowledge 

reg  ARTRY_;  //Address  Retry 

//  Data  Arbitration 

wire  CPU_DBG_;  //Data  Bus  Grant 

reg  DBWO_;    //Data  Bus  Write  Only 
tri  1  DBB_;         //Data  Bus  Busy 

//  Data  Transfer 

wire  [0:63]  D;  //Data 

wire  [0:7]  DP;  //Data  Parity 

wire  DPE_,  //Data  Parity  Error 

DBDIS_;       //Data  Bus  Disable 

//  Data  Termination 

tri  1  TA_;  //Transfer  Acknowledge 

reg  DRTRY_;  //Data  Retry 

reg  TEA_;         //Transfer  Error  Acknowledge 

//  System  control 

reg  HRESET_;  //Hard  Reset 

wire  PRC_BR_;        //PRC  Bus  Request 

wire  CANX; 

//Declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =  1'bO, 
hi     =  l'bl, 
low   =  1'bO; 

//Initialize  values, 
initial 
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begin 

DBWO_  =hi;  //Limits  CPU  to  in-order  transactions. 

TEA_   =  hi;    //Only  asserted  for  nonrecoverable  bus  error  events. 

ARTRY_  =  hi;  //Retries  used  only  with  multiprocessor  or  multi- 

DRTRY_  =  hi;  //  level  memory  systems. 

HRESET_=  hi; 
end 


//define  system  clock,  66  MHz,  T  =  1 5  ns. 
reg  elk; 
initial  elk  =  1; 
always 
begin 
#7  elk  =  0; 
#8clk=l; 
end 


//Connect  parts 
cpu      CPU1(CPU_BR_,CPU_BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_, 

CI_,WT_,CSE,AACK_,ARTRY_,CPU_DBG_,DBWO_,DBB_,D,DP, 

DPE_,DBDIS_,TA_,DRTRY_,TEA_,clk); 
memory  MEM1(ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CI_,WT_,CSE,AACK_ 

DBWO_,DBB_,D.DP,DPE_,DBDIS_,TA_,TEA_,CANX,clk); 
arbiter  ARB  1(CPU_BR_,CPU_BG_,CPU_DBG_,PRC_BR_JPRC_BG_,PRC_DBG_, 

ABB_,DBB_,clk); 
pre     PRC 1  (CPU_BR_  ,PRC_BR_,PRC_BG_,  ABB_,TS_,  A,AP,APE_,TT,TSIZ,TC, 

TBST_,AACK_,PRC_DBG_,DBB_,D,DP,DPE_,TA_,HRESET_,CANX,clk); 


//run  simulation 
initial 
begin 
//$shm_open; 

#5  HRESET_  =  low;     //Reset  entire  system. 
#5  HRESET_  =  hi; 
//#4000; 

//$shm_probe(PRCl,'AS"); 

#152000  Sfinish;      //Adjust  this  time  according  to  the  instructions 
/An  the  sequencers, 
end 

endmodule 
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B.  CPU 


/* ***************************************************************************** 

*  PowerPC-603  CPU 

*  Filename:  cpu.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     24AUG95 

*  Revised:  10JAN96 

* 

*  Purpose:  This  module  emulates  the  PowerPC-603  microprocessor.  Note  that 

*  most  signals  are  active  low.  This  makes  it  slightly  more  difficult  to  work 

*  one's  way  through  all  the  double  negatives  in  this  code's  conditional 

*  statements,  but  makes  it  much  easier  to  correlate  against  the  timing  diagrams 

*  in  the  PowerPC-603  User  Manual.  This  model  uses  the  same  notations  for 

*  signals  mat  connect  to  other  modules. 

*  Tins  module  uses  the  sequencer  module  to  determine  the  operations  the  CPU 

*  will  perform.  This  model  of  the  PowerPC-603  is  capable  of  performing  reads, 

*  writes,  burst  reads,  and  burst  writes.  It  handles  bus  arbitration  just  like 

*  the  '603  including  the  pipelined  address  tenures.  Please  refer  to  the 

*  PowerPC-603  User  Manual  for  a  detailed  description  of  the  nature  and  timing 

*  of  each  signal. 

* 

module  cpu  (BR_,BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CI_,WT_,CSE,AACK_, 
ARTRY_,DBG_,DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,DRTRY_,TEA_,clk); 

//  Signals  are  defined  in  system.v. 

input  BG_,AACK_,DBG_,DBWO_,DBDIS_,TA_,ARTRY_,DRTRY_,TEA_,clk; 

output  BR_,APE_,CI_,WT_,CSE,DPE_; 

inout  [0:31]  A; 

inout  [0:63]  D; 

inout  [0:7]  DP; 

inout  [0:4]  TT; 

inout  [0:3]  AP; 

inout  [0:2]TSIZ; 

inout  [0:1]  TC; 

inout  ABB_,TS_,TBST_,GBL_,DBB_; 

reg  BR_,APE_,CI_,WT_,CSE,DPE_; 

tri  [0:31]  A; 

tri  [0:63]  D; 

tri  [0:7]  DP; 

tri  [0:4]  TT; 

tri  [0:3]  AP; 

tri  [0:2]TSIZ; 

tri  [0:1]  TC; 

tri  ABB_,TS_,TBST_,GBL_,DBB_; 
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//declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =11)0, 
hi     =  l'bl, 
low    =  l*bO, 
trace  =  FALSE; 

//Address  related 

wire  [0:31]  seq_addr; 

reg  [0:31]  addr_reg,  address[0:l]; 

reg  [0:31]  a_reg; 

assign  A  =  a_reg; 
reg  [0:3]  ap_reg,  addr_parity_in,  addr_parity_calc; 

assign  AP  =  ap_reg; 

//Data  related 

reg  [0:63]  data  [0:1]; 

wire  [0:63]  seq_data; 

reg  [0:63]  d_reg,  load_data,  data_reg; 

assign  D  =  d_reg; 
reg  [0:255]  lme_reg,  line  [0:1]; 
wire  [0:255]  seqjine; 
reg  [0:7]  dp_reg,  d_parity_in,  d_parity_calc; 

assign  DP  =  dp_reg; 

//Other  external  control  signals 

reg  Transfer_start  [0:1]; 

reg  abb_reg_,  dbb_reg_,  ts_reg_,  tbst_reg_; 

assign  ABB_  =  abb_reg_; 

assign  TS_   =  ts_reg_; 

assign  DBB_  =  dbb_reg_; 

assign  TBST_  =  tbst_reg_; 

reg  [0:4]  Transfer_type  [0:1]; 

wire  [0:4]  seq_Transfer_type; 

reg  [0:4]  tt_reg; 
assign  TT  =  tt_reg; 
parameter  //for  Transfer_type 
none         =  5'bz, 
write        =  5'b00010,         //02 
write_atomic  =  5'bl0010,   //12 
read         =  5'b0l010,         //0A 
read_atomic  =5'bll010,    //1A 
burst_wnte  =5'b00110,     //06 
burst_read   =  5'bOlllO.     //0E 
burst_read_atomic  =  5'bl  1 1 10;    //IE 

reg  [0:2]  Transfer_size  [0: 1  ] ; 
wire  [0:2]  seq_Transfer_size; 
reg  [0:2]  tsiz_reg; 
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assign  TSIZ  =  tsiz_reg; 

reg  [0:1]  Transfer_code  [0:1]; 
wire  [0:1]  seq_Transfer_code; 
reg  [0:1]  tc_reg; 
assign  TC  =  tc_reg; 
parameter  //for  Transfer_code 
data_transfer     =  2'b00, 
touchjoad        =2'b01, 
instruction_fetch  =  2'blO, 
reserved  =  2'bl  1; 


//Other  internal  control  signals 

reg  need_bus_; 

wire  need_bus_trigger_; 

reg  AB_Master.DB_Master.  Addr_termination; 

wire  qual_BG_,qual_DBG_; 

reg  [0:7]  index; 

wire  parked; 

wire  pp; 

reg  dpp; 

event  transfer_acknowledged; 


//initialize  signals 
initial 
begin 

a_reg      <=  32'bz; 

ap_reg     <=  4'bz; 

addr_parity_in  <=  4'bz; 

addr_parity_calc  <=  4'bz; 

addr_reg    <=  32'bz; 

address[0]  <=  32'bz; 

address[l]  <=  32'bz; 

data[0]  <=  64'bz; 
data[l]  <=  64'bz; 
d_reg     <=  64'bz; 
line[0]  <=256'bz; 
line[l]  <=256'bz; 
Une_reg  <=  256'bz; 
d_panty_in    <=  8'bz; 
d_parity_calc  <=  8'bz; 
dp_reg    <=    8'bz; 

APE_  <=  'bz; 
BR_  <=  hi; 
CI_  <=hi; 
CSE  <=  low; 
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DPE_  <=  'bz; 
WT_  <=  hi; 
abb_reg_  <=  'bz; 
dbb_reg_  <=  'bz; 
ts_reg_   <=  T?z; 
tbst_reg_  <=  'bz; 
Transfer_type[0]  <=  none; 
Transfer_type[l]  <=  none; 
tt_reg  <=  none; 
Transfer_size[0]  <=  0; 
Transfer_size[l]  <=  0; 
tsiz_reg  <=  'bz; 
Transfer_code[0]  <=  reserved; 
Transfer_code[l]  <=  reserved; 
tc_reg  <=  2'bz; 
Transfer_start[0]  <=  FALSE; 
Transfer_start[l]  <=  FALSE; 

AB_Master  <=  FALSE; 
DB_Master  <=  FALSE; 
Addr_termination  <=  FALSE; 
need_bus_  <=  hi; 
dpp  <=  0; 
end 

// 

sequencer  SEQ  l(seq_Transfer_size,clk,pp,seq_addr,seq_data,seq_line, 

seq_Transfer_type,seq_Transfer_code4ieed_bus_trigger_,ABB_); 

always  @(negedge  need_bus_trigger_) 
begin 

address[pp]  <=  seq_addr; 

data[pp]     <=  seq_data; 

line[pp]     <=  seqjine; 

Transfer_type[pp]  <=  seq_Transfer_type; 

Transfer_size[pp]  <=  seq_Transfer_size; 

Transfer_code[pp]  <=  seq_Transfer_code; 
end 


// 

//ADDRESS  BUS  TENURE 

//  ***  1.  Address  bus  arbitration 

always  @(negedge  need_bus_trigger_J 
need_bus_  =  low; 

//Parked  means  that  die  CPU  can  take  the  bus  as  soon  as  it  needs  it. 
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assign  parked  =  (!BG_  &  ABB_  &  ARTRYJ; 

//If  CPU  needs  bus,  it  needs  to  assert  BR_  only  if  not  parked, 
always  @(posedge  elk) 
if  (BR_  =  hi) 
BR_  =  #7  ~(need_bus_=low  &  parked=FALSE); 

assign  qual_BG_  =  ~(need_bus_=low  &  parked=TRUE); 

//Assume  mastership 
always  @(posedge  elk) 
if  (qual_BG_  =  low) 
begin 
abb_reg_  =  #7  low; 
AB.Master  =  TRUE; 
BR_  <=#1  hi; 
need_bus_  <=  #2  hi; 
end 

II  ***  2.  Address  Transfer 

always  @(posedge  elk) 
if(qual_BG_  =  low) 
begin 

addr_reg  =  address[pp]; 

addr_parity_calc[0]  <=  ~Aaddr_reg[0:7]; 

addr_parity_calc[l]  <^=  ~Aaddr_reg[8:15]; 

addr_parity_calc[2]  <=  ~Aaddr_reg[  16:23]; 

addr_parity_calc[3]  <=  ~Aaddr_reg[24:31]; 

ts_reg_  =  #7  low; 

Transfer_start[pp]  <=  TRUE; 

a_reg    <=  address  [pp]; 

ap_reg   <=  addr_parity_calc; 

tt_reg   <=  Transfer_type[pp]; 

tsiz_reg  <=  Transfer_size[pp]; 

tc_reg    <=  Transfer_code[pp]; 

if  (Transfer_type[pp]  =  burst_read 
II  Transfer_type[pp]  ==  burst_write) 
tbst_reg_  <=  low; 

//insen  other  address  transfer  characteristics  here, 
end 

always  @(posedge  elk) 
if  (AB_Master  &  TS_=low) 
begin 
ts_reg_  =  #7  hi; 
wait  (AACK_=low); 
Addr_termination  =  TRUE; 
end 
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always  @(posedge  elk) 
if  (Addr_termination) 
begin 
#7  ts_reg_  <=  'bz; 
a_reg      <=  'bz; 
ap_reg     <=  'bz; 
tt_reg     <=  'bz; 
tc_reg     <=  'bz; 
tsiz_reg    <=  'bz; 
tbst_reg_  <=  'bz; 

//insert  other  addr  transfer  characteristics  here. 
abb_reg_  <=  #2  hi; 
abb_reg_  <=  #8  'bz; 
AB.Master  =  FALSE; 
Addr_terminadon  =  FALSE; 
end 
// 
//DATA  BUS  TENURE 

assign  qual_DBG_  =  ~(!DBG_  &  DBB_  &  DRTRYJ; 

always  @(posedge  elk) 
begin 
if  (TA_  ==  low) 
->  transfer_acknowledged; 
end 

always 
begin 
#2  dpp  =  ~dpp; 
case(Transfer_type[dpp]) 
none:  begin  end 

//Note:  TS  is  an  implied  data  bus  request.  CPU  can  assume  mastership  if  it 
//has  a  qualified  data  bus  grant. 

read:  begin 

//wait  for  qualified  data  bus  grant  and  transfer  start. 

wait(qual_DBG_=low  &  Transfer_start[dpp]); 

@(posedge  elk)  //assume  data  bus  mastership 

dbb_reg_  <=  #7  low; 

@(transfer_acknowledged)  //latch  data  and  terminate  read 

data[dpp]  <=  D; 

data_reg  <=  D; 

d_parity_in  <=  DP; 

Transfer_type[dpp]  <=  none; 

Transfer_code[dpp]  <=  reserved; 

Transfer_start[dpp]  =  FALSE; 

d_parity_calc[0]  <=  ~Adata_reg[0:7]; 

d_parity_calc[l]  <=  ~Adata_reg[8:15]; 


d_parity_calc[2]  <=  ~Adata_reg[16:23]; 
d_parity_calc[3]  <=  ~Adata_reg[24:31]; 
d_parity_calc[4]  <=  ~Adata_reg[32:39]; 
d_parity_calc[5]  <=  -^datajreg^O-Al]; 
d_parity_calc[6]  <=  -Matajeg^^S]; 
d_parity_calc[7]  <=  ~Adata_reg[56:61]; 
if  (trace)  begin 
$display("CPU  read  %h  from  address  %h.", 

data[dpp],address[dpp]); 
$display("   Completed  at  time  %d",$time); 
end 

dbb_reg_  =  #4  hi; 
dbb_reg_  =  #8  'bz; 
if  (d_parity_in  !=  d_parity_calc) 
begin 
$display("CPU:  data  parity  error."); 
$display("    Calculated  parity:  %b", 

d_parity_calc); 
$display("    Recevied  parity:    %b", 
d_parity_in); 
end 
end 

write:  begin 

data_reg  =  data[dpp]; 
d_parity_calc[0]  <=  ~Adata_reg[0:7]; 
d_parity_calc[l]  <=  ~Adata_reg[8:15]; 
d_parity_calc[2]  <=  -^ata.regC  16:23]; 
d_parity_calc[3]  <=  -Aiata^eg [24:31]; 
d_parity_calc[4]  <=  ~Adata_reg[32:39]; 
d_parity_calc[5]  <=  -Matajeg^O^]; 
d_parity_calc[6]  <=  ~Adata_reg[48:55]; 
d_parity_calc[7]  <=  ~Adata_reg[56:61]; 
//wait  for  qualified  data  bus  grant  and  transfer  start. 
wait(qual_DBG_=low  &  Transfer_start[dpp]); 
@(posedge  elk)  //assume  data  bus  mastership 
dbb_reg_  =  #7  low; 
d_reg  <=  data[dpp]; 
dp_reg  <=  d_parity_calc; 
@(transfer_acknowledged)  //terminate  write 
d_reg  <=  #7  64'bz; 
dp_reg  <=  #7  8'bz; 
Transfer_type[dpp]  <=  none; 
Transfer_start[dpp]  =  FALSE; 
if  (trace)  begin 

$display("CPU  wrote  %h  to  address  %h.", 
data[dpp],address[dpp]); 

$display("   Completed  at  time  %d",$time); 

end 
dbb_reg_  =  #4  hi; 
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dbb_reg_  =  #8  "bz; 
end 

burst_read:  begin 

//wait  for  qualified  data  bus  grant  and  transfer  start. 
wait(qual_DBG_=low  &  Transfer_start[dpp]); 
@(posedge  elk)  //assume  data  bus  mastership 
dbb_reg_  <=  #7  low; 

if  (trace) 
$display("CPU  started  read  from  address  %h  at  time  %d. 

address[dpp],$time); 
repeat  (4)  begin 
@(transfer_acknowledged)  //latch  beat 
data[dpp]  <=  D; 
data_reg  <=  D; 
d_parity_in  =  DP; 
#1  if  (trace) 
$display("     CPU  read:  %h  at  %d",data[dpp],$time); 
d_panty_calc[0]  <=  ~Adata_reg[0:7]; 
d_parity_calc[l]  <=  ~Adata_reg[8:15]; 
d_parity_calc[2]  <=  ~Adata_reg[16:23]; 
d_parity_calc[3]  <=  ~Adata_reg[24:31]; 
d_parity_calc[4]  <=  ~Adata_reg[32:39]; 
d_parity_calc[5]  <=  ~Adata_reg[40:47]; 
d_parity_calc[6]  <=  ~Adata_reg[48:55]; 
d_parity_calc[7]  =  ~Adata_reg[56:61]; 
#2  if  (d_parity_in  !=  d_parity_calc) 
begin 
$display("CPU:  data  parity  error."); 
$display("   Calculated  parity:  %b", 

d_parity_calc); 
$display("    Recevied  parity:    %b", 
d_parity_in); 
end 
end 

Transfer_type[dpp]  <=  none; 
Transfer_code[dpp]  <=  reserved; 
Transfer_start[dpp]  <=  FALSE; 
dbb_reg_  =  #4  hi; 
dbb_reg_  =  #8  'bz; 
end 

burst_write:  begin 

//wait  for  qualified  data  bus  grant  and  transfer  start. 
wait(qual_DBG_=low  &  Transfer_start[dpp]); 
if  (trace) 

$display("CPU  started  write  to  address  %h  at  time  %d.", 
address[dpp],$time); 
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@(posedge  elk)  //assume  data  bus  mastership 
dbb_reg_  =  #6  low; 
line_reg  =  line[dpp]; 
data_reg  =  line_reg[0:63]; 
d_parity_calc[0]  <=  ~Adata_reg[0:7]; 
d_parity_calc[l]  <=  -Mata.reg^lS]; 
d_parity_calc[2]  <=  ~Adata_reg[16:23]; 
d_parity_calc[3]  <=  ~Adata_reg[24:31]; 
d_parity_calc[4]  <=  ~Adata_reg[32:39]; 
d_parity_calc[5]  <=  ~Adata_reg[40:47]; 
d_parity_calc[6]  <=  ~Adata_reg[48:55]; 
d_parity_calc[7]  =  -Matajeg  [56:61]; 
dp_reg  <=  d_parity_calc; 
d_reg  =  line_reg[0:63]; 
#1  if  (trace) 

$display("     CPU  write  beat  1:  %h  at  %d",d_reg,$time); 
@(transfer_acknowledged);      //first  beat  done 

data_reg  =  line_reg[64:127]; 
d_parity_calc[0]  <=  ~Adata_reg[0:7]; 
d_parity_calc[l]  <=  ~Adata_reg[8:15]; 
d_parity_calc[2]  <=  -Mata.regt  16:23]; 
d_parity_calc[3]  <=  ~Adata_reg[24:31]; 
d_parity_calc[4]  <=  -Matajeg [32:39]; 
d_parity_calc[5]  <=  ~Mata_reg [40:47]; 
d_parity_calc[6]  <=  ~Adata_reg  [48:55]; 
d_parity_calc[7]  =  ~Adata_reg[56:61]; 
dp_reg  <=  d_parity_calc; 
#7  d_reg  =  line_reg[64:127]; 
#1  if  (trace) 

SdisplayC     CPU  write  beat  2:  %h  at  %d",d_reg,$time); 
@(transfer_acknowledged);      //second  beat  done 

data_reg  =  line_reg[128: 191]; 
d_parity_calc[0]  <=  ~Adata_reg[0:7]; 
d_parity_calc[l]  <=  ~Mata_reg[8:15]; 
d_parity_calc[2]  <=  -Mata.regt  16:23]; 
d_parity_calc[3]  <=  ~Adata_reg  [24:31]; 
d_parity_calc[4]  <=  ~Adata_reg[32:39]; 
d_parity_calc[5]  <=  ~Adata_reg [40:47]; 
d_parity_calc[6]  <=  ~Adata_reg[48:55]; 
d_parity_calc[7]  =  ~Adata_reg[56:61]; 
dp_reg  <=  d_panty_calc; 
#7  d_reg  =  line_reg[128:191]; 
#1  if  (trace) 
SdisplayC     CPU  write  beat  3:  %h  at  %d",d_reg,$time); 
@(transfer_acknowledged);      //third  beat  done 

data_reg  =  line_reg[191:255]; 
d_parity_calc[0]  <=  -Mata^egfO:?]; 


84 


d_parity_calc[l]  <=  ~Adata_reg[8:15]; 
d_parity_calc[2]  <=  ~Adata_reg[16:23]; 
d_parity_calc[3]  <=  ~Adata_reg[24:31]; 
d_parity_calc[4]  <=  ~Adata_reg[32:39]; 
d_parity_calc[5]  <=  ~Adata_regr40:47]; 
d_parity_calc[6]  <=  ~Adata_reg[48:55]; 
d_parity_calc[7]  =  ~Adata_reg[56:61]; 
dp_reg  <=  d_parity_calc; 
#7  d_reg  =  line_reg[  192:255]; 
#1  if  (trace) 

$display("      CPU  write  beat  4:  %h  at  %d",d_reg,$time); 
@(transfer_acknowledged);      //fourth  beat  done 
djreg  <=  #7  64'bz; 
dp_reg  <=  #7  8'bz; 
line_reg  <=  #7  256'bz; 
Transfer_type[dpp]  <=  #7  none; 
Transfer_code[dpp]  <=  #7  reserved; 
Transfer_start[dpp]  <=  #7  FALSE; 
dbb_reg_  =  #4  hi; 
dbb_reg_  =  #8  'bz; 
end 

default:  $display("CPU  module  has  bad  TT[%b]  =  %b",dpp, 
Transfer_type[dpp],"  at  time  %d.",$time); 
endcase 
end 

endmodule 


C .  ARBITER 


*  BUS  ARBITRATION  UNIT 

*  Filename:  arbiter.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     24AUG95 

*  Revised:  10JAN96 

*  Purpose:  This  module  emulates  the  system's  external  bus  arbitration  unit. 

*  It  is  implemented  as  a  Finite  State  Machine. 

*  There  are  only  two  possible  bus  masters  in  this  system:  the  CPU  and  the  PRC. 

*  Also,  the  address  bus  and  data  bus  are  each  arbitrated  for  independendy, 

*  though  the  data  bus  arbitradon  occurs  after  the  corresponding  address  bus 

*  arbitration. 

*  If  a  unit  wants  the  address  bus,  it  asserts  BR_.  If  the  bus  is  available, 

*  the  aribter  asserts  BG_  back  to  that  unit,  which  can  then  take  mastership  by 

*  asserting  ABB_.  When  it  is  done  with  the  address  bus,  it  negates  ABB_. 


85 


*  It  is  assumed  that  if  a  unit  wanted  the  address  bus,  it  will  also  want  the 

*  data  bus.  "Address  only"  transactions  will  not  occur  in  this  system,  since 

*  there  is  no  external  cache  or  multiprocessors.  Therefore,  after  asserting 

*  BG_  to  the  requesting  unit,  the  arbiter  asserts  DBG_  on  the  next  cycle. 

*  BG_  and  DBG_  are  both  asserted  undl  the  requesting  unit  takes  mastership, 

*  unless  the  requesting  unit  withdraws  its  request  by  negating  BR_. 

*  If  there  are  no  pending  bus  requests,  the  arbiter  "parks"  the  CPU  by 

*  granting  it  the  busses.  This  reduces  memory  access  time  for  the  CPU.  If  the 

*  CPU  is  parked,  and  then  the  PRC  requests  the  bus,  the  CPU  is  imparked,  and 

*  the  arbiter  can  then  grant  the  bus  to  the  PRC. 

*  The  PowerPC  can  conduct  a  second  address  tenure  long  before  the  first  data 

*  tenure  is  complete.  This  pipelining  has  a  maximum  depth  of  two  transactions, 

*  meaning  that  a  third  address  tenure  will  not  start  before  the  first  data 

*  tenure  is  complete.  The  Memory  Unit  in  this  Testbench  is  capable  of  handling 

*  that  situation.  However,  adding  the  PRC  to  the  system  creates  the 

*  possibility  that  the  PRC  will  initiate  a  third  address  tenure  before  the 

*  first  of  two  CPU  transactions  is  complete.  This  situation  is  handle  by  this 

*  Arbiter  which  keeps  track  of  the  pipelining  depth.  It  will  not  grant  the 

*  address  bus  to  any  unit  if  that  address  tenure  would  put  a  third  transacdon 

*  in  the  pipeline.  Rather,  the  arbiter  will  stall  undl  the  data  tenure  from 

*  the  first  transacdon  is  complete,  and  then  will  grant  the  address  bus  to  the 

*  requesting  unit. 
* 

module  arbiter  (CPU_BR_,CPU_BG_,CPU_DBG_,PRC_BR_J>RC_BG_,PRC_DBG_, 
ABB_,DBB_,clk); 

output  CPU_BG_,  CPU_DBG_,  PRC_BG_,  PRC_DBG_; 
input  CPU_BR_-  PRC_BR_,  ABB_,  DBB_,  elk; 
reg  CPU_BG_,CPU_DBG_,  PRC_BG_,  PRC_DBG_; 
wire  CPU_BR_,  PRC_BR_,  elk; 

//Declare  variables,  constants,  parameters 
parameter  TRUE  =  I'M, 

FALSE  =11)0, 

hi     =  I'M, 

low   =  1'bO; 
reg  [1 :0]  requests;  //concatenated  input  signals 

reg  [1:0]  depth; 
tri  stall; 

//Finite  State  Machine  variables  and  parameters 
reg  [2:0]  state,  next_state; 
parameter  start  =  1, 

grant_cpu_a  =  2, 

park_cpu     =  3, 

grant_cpu_d  =  4, 

grant_prc_a  =  5, 

wait_for_prc  =  6, 

grant_prc_d  =  7; 
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//Initialize  outputs 
initial 
begin 

CPU_BG_  <=  hi; 

CPU_DBG_  <=  hi; 

PRC_BG_  <=hi; 

PRC_DBG_  <=  hi; 

state      <=  start; 

next_state  <=  start; 

requests  <=  'bll; 

depth  <=  0; 
end 

//Track  depth  of  pipeline 
always  @(posedge  ABB_) 

begin 
depth  =  depth  +  1; 

end 

always  @(posedge  DBB_) 
begin 

depth  =  depth  -  1; 
end 

assign  stall  =  (depth  >  1); 

// 
//Arbitration 

always 
begin 

wait  (Istall); 

#5  state  =  next_state; 

#1  case  (state) 
start:  //l 
begin 
CPU_BG_  <=hi; 
CPU_DBG_  <=  hi; 
PRC_BG_  <=hi; 
PRC_DBG_  <=  hi; 

@(posedge  elk)  requests  =  {CPU_BR_,PRC_BR_}: 
case  (requests) 
2'b00:  next_state  =  grant_cpu_a; 
2'b01:  next_state  =  grant_cpu_a; 
2'blO:  next_state  =  grant_prc_a; 
2'bl  1:  next_state  =  grant_cpu_a; 
endcase 
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end 

grant_cpu_a:  111 
begin 

CPU_BG_  <=low; 

CPU_DBG_  <=  hi; 

PRC_BG_  <=  hi; 

PRC_DBG_  <=  hi; 

@(posedgeclk); 

next_state  =  park_cpu; 
end 

park_cpu:  //3 
begin 
CPU_BG_  <=  low; 
CPU_DBG_<=low; 
PRC_BG_  <=hi; 
PRC_DBG_  <=  hi; 

@(posedge  elk)  requests  =  {CPU_BR_,PRC_BR_ 
case  (requests) 
2'b00:  next_state  =  park_cpu; 
2'b01:  next_state  =  park_cpu; 
2'blO:  next_state  =  grant_cpu_d; 
2'bl  1:  next_state  =  park_cpu; 
endcase 
end 

grant_cpu_d:  //4 
begin 
CPU_BG_  <=  hi; 
CPU_DBG_  <=  low; 
PRC_BG_  <=  hi; 
PRC_DBG_<=hi; 

@(posedge  elk)  requests  =  { CPU_BR_,PRC_BR. 
case  (requests) 
2'b00:  next_state  =  park_cpu; 
2'b01:  next_state  =  park_cpu; 
2'blO:  next_state  =  grant_prc_a; 
2'bl  1:  next_state  =  park_cpu; 
endcase 
end 

grant_prc_a:  115 
begin 
CPU_BG_  <=  hi; 
CPU_DBG_  <=  hi; 
PRC_BG_  <=  low; 
PRC_DBG_  <=  hi; 
@(posedge  elk); 
next_state  =  wait_for_prc; 
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wait_for_prc:  //6 
begin 
CPU_BG_  <=hi; 
CPU_DBG_  <=  hi; 
PRC_BG_  <=low; 
PRC_DBG_  <=  low; 

@(posedge  elk)  requests  =  {CPU_BR_,PRC_BR_ 
case  (requests) 
2'bOO:  next_state  =  wait_for_prc; 
2'bOl:  next_state  =  grant_cpu_d; 
2'blO:  next_state  =  wait_for_prc; 
2'bl  1:  next_state  =  grant_prc_d; 
endcase 
end 

grant_prc_d:  IP 
begin 
CPU_BG_  <=  hi; 
CPU_DBG_  <=  hi; 
PRC_BG_  <=hi; 
PRC_DBG_  <=  low; 
wait  (DBB_  =  hi); 

@(posedge  elk)  requests  =  {CPU_BR_,PRC_BR. 
case  (requests) 

2'bOO:  next_state  =  grant_cpu_a; 

2'bOl:  next_state  =  grant_cpu_a; 

2'blO:  next_state  =  grant_prc_a; 

2'bl  1:  next_state  =  grant_cpu_a; 
endcase 
end 

default:  $display( "state  error  in  module  arbiter"); 

endcase 

end 

endmodule 


D .  MEMORY 


*  RANDOM  ACCESS  MEMORY 

*  Filename:  memory.v 

*  Author:  Joseph  R.  Robert,  Jr. 
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*  Date:     24AUG95 

*  Revised:  10JAN96 

* 

*  Purpose:  This  module  emulates  the  system's  main  memory.  For  simulation 

*  efficiency,  the  memory  has  only  enough  physical  address  space  for  four  burst 

*  reads.  Thus,  128  bytes.  The  address  bus  width  allows  a  virtual  address  space 

*  of  4  G-bytes.  Accesses  to  addresses  past  the  first  128  bytes  map  to  within 

*  the  first  128  bytes. 

*  The  time  required  for  memory  accesses  are  determined  by  Delayl  and 

*  Delay2.  Delayl  is  the  delay,  in  cycle,  required  for  the  initial  access. 

*  Delay2  is  the  delay  required  for  each  successive  beat  of  four-beat 

*  operations.  Set  them  both  to  0  for  fastest  memory  response.  Set  them  to  8 

*  and  3  respectively  for  realistic  memory  response  of  a  60  ns  DRAM.  Do  not  set 

*  Delay2  >  Delayl.  That  will  not  represent  a  realistic  memory  response,  and 

*  will  probably  cause  this  module  to  act  weird. 

*  There  is  a  two-stage  pipeline  involved  with  memory  accesses,  such  that  a 

*  memory  tenure  can  be  started  while  the  previous  data  tenure  is  still  active. 

*  To  accomplish  this,  some  signals  have  [0:1]  in  their  declaration,  and  are 

*  indexed  using  pp  and  dpp,  which  are  the  address  pipeline  position  pointer, 

*  and  the  data  pipeline  position  pointer,  respectively. 

*  To  keep  this  model  simple,  a  single-beat  read  will  always  return  a 

*  single  byte  of  data,  regardless  of  TSIZ,  in  byte  lane  0,  which  is  different 

*  from  the  way  the  PowerPC  really  operates.  See  Table  10-4  on  pg.  10-15  of 

*  the  PowerPC-603  Users  Manual  for  actual  alignment.  This  simplification  is 

*  irrelevant  to  the  performance  of  the  PRC  which  deals  only  with  burst 

*  operations. 

*  It  is  important  to  note  that  this  memory  module  had  to  have  one  feature 

*  that  is  not  typical  of  memory  modules.  It  has  a  CANX  input  with  cancels  the 

*  current  read  operadon.  It  is  through  this  signal  that  the  PRC  stops  the 

*  memory  module  from  delivering  data  to  die  CPU  when  die  PRC  already  has  the 

*  data. 

* 

module  memory  (ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CI_,WT_,CSE,AACK_, 
DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,TEA_,CANX,clk); 

//  Signals  are  defined  in  system.v. 

output  AACK_,DBDIS_,TA_,APE_; 

input  [0:1]  TC; 

input  DBWO_,CI_,WT_,CSE,TEA_,DPE_,CANX,clk; 

input  [0:31]  A; 

inout  [0:63]  D; 

mout  [0:7]  DP; 

input  [0:4]  TT; 

mout  [0:3]  AP; 

mout  [0:2]  TSIZ; 

inout  ABB_,TS_,TBST_,GBL_,DBB_; 

wire  [0:31]  A; 

wire  CI_,WT_,CSE,TEA_,DPE_,ARTRY_; 
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reg  AACK_,APE_,DBDIS_,DRTRY_; 

tri  [0:63]  D; 

tri  [0:7]  DP; 

tri  [0:3]  AP; 

tri  [0:2]TSIZ; 

tri  ABB_,TS_,TBST_,GBL_,DBB_,TA_; 

reg  [0:63]  d_reg,  data; 

assign  D  =  d_reg: 
// 

//Declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl. 

FALSE  =  l'bO, 

hi     =  l'bl, 

low    =l'bO, 

Size   =  128,  //Size  of  memory  in  bytes. 

Length  =  7,  //Length  of  physical  address  in  bits. 

Delay  1  =  8,  //Delay  for  address  translation. 

Delay2  =  3;  //Delay  between  successive  beats. 

parameter  //for  Transfer_type 


none 

=  5'bzzzzz, 

write 

=  5*b00010. 

wnte_atomic 

=  5'bl00l0. 

read 

=  5'bOlOlO, 

read_atomic 

=  5'bl  1010, 

burst_write 

=  5'b00110, 

burst_read 

=  5'bOlllO, 

burst_read_atomic  =  5'bl  1 1 10; 

reg  [0:31]  virtual_addr,  index; 
reg  [0:3]  addr_parity_calc,addr_parity_in; 
reg  [0:Length-l]  pa_reg,  physical_addr  [0:1]; 
reg  [0:7] 

mem  [0:Size-l], 

mem_reg;       //Memory  data  register 
reg  [0:4]  Transfer_type  [0:1]; 
reg  [0:2]  Transfer_size  [0:1]; 
reg  burst  [0:1]; 
reg  [0:1]  i,  burst_start; 

reg  pp,dpp;  //current  pipeline  and  data  pipeline  positions 
reg  abort; 
reg  ta_reg_; 

assign  TA_  =  ta_reg_; 

//Initialize  memory 
initial 
begin 

abort  <=  FALSE; 

AACK_  <=  hi; 

addr_parity_calc  <=  3'bz; 
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addr_parity_in   <=  3'bz; 
DBDIS_  <=  hi; 
ta_reg_   <=  'bz; 
d_reg  <=  Ct4'bz; 
Transfer_type[0]  <=  none; 
Transfer_type[l]  <=  none; 
Transfer_size[0]  <=  'bz; 
Transfer_size[l]  <=  'bz; 
burst [0]  <=  'bz; 
burstfl]  <=  'bz; 
pp  <=  l'bl; 
dpp<=  l'bl; 

for  (index  =  0;  index<Size;  index=index+l) 
mem  [index]  =  index; 
end 

// 

//ADDRESS  TENURE 
always  @(posedge  elk) 
begin 
if  (ABB_  =  low) 
begin 
//latch  address  and  attributes 
pp  =  ~pp; 

Transfer_type[pp]  <=  TT; 
Transfer_size[pp]  <=  TSIZ; 
burst  [pp]  <=  TBST_; 
//insert  other  attributes  here. 
addr_parity_in  <=  AP; 
virtual_addr  =  A; 

addr_parity_calc[0]  <=  ~Avirtual_addr[0:7]; 
addr_parity_calc[l]  <=  ~Avirtual_addr[8:15]; 
addr_parity_calc[2]  <=  ~Avirrual_addr[16:23]; 
addr_parity_calc[3]  <=  ~Avirrual_addr[24:31]; 
physical_addr[pp]  =  virtual_addr[32-Length:31]; 
if  (addr_parity_in  !=  addr_parity_calc) 
begin 
$display("Memory:  address  parity  error."); 
$display("   Calculated  parity:  %b",addr_parity_calc); 
$display("    Recevied  parity:    %b",addr_parity_in); 
end 
AACK_  =  #7  low; 
wait  (AACK_=hi); 
end 
end 

always  @(posedge  elk) 
begin 
if  (AACK_  =  low) 

AACK_  =  #7hi; 
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end 

//DATA  TENURE 
always  @(posedge  elk) 
begin 
if  (CANX  =  hi) 
abort  =  TRUE; 
end 


always 
begin 
#1  dpp  =  ~dpp; 
#1  case  (Transfer_type[dpp]) 
none:  begin  end 

read: 
begin 
repeat(Delayl)@(posedge  elk); 
#7  ta_reg_  <=  low; 

d_reg[0:7]  <=  mem[physical_addr[dpp]]; 
Transfer_size[dpp]  <=  'bz; 
@(posedge  elk) 
Transfer_type[dpp]  <=  none; 
#7  ta_reg_  =  'bz; 
d_reg[0:7]  <=  'bz; 
end 

write: 
begin 
repeat(Delayl)@(posedge  elk); 
#7  ta_reg_  <=  low; 
@(posedge  elk) 
//latch  data 
data  =  D; 

mem[physical_addr[dpp]]    <=  data[0:7]; 
#7  ta_reg_  =  'bz; 
Transfer_size[dpp]  <=  'bz; 
Transfer_type[dpp]  <=  none; 
end 

burst_read: 
begin 
//find  critical  double-word 
#2  pa_reg  =  physical_addr[dpp]; 
burst_start  =  pa_reg[Length-5:Length-4]; 
//align  to  cache  line 

pa_reg[Length-5:Length-l]  =  5'b00000; 
physical_addr[dpp]  =  pa_reg; 
if  (labort)  if  (Delay  1-Delay2-1  >=  0) 
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repeat( Delay  1  -Delay2- 1  )@ (posedge  elk); 

for  (index=0;  index<4;  index=index+l) 
begin 
if  (labort)  repeat(Delay2)@(posedge  elk); 
if  (Delay  1-Delay2!=0  II  index!=0)  @(posedge  elk); 
if  (labort)  begin 
#7  ta_reg_  <=  low; 
i  =  burst_start+index;  //i  is  mod  4 
d_reg[  0:  7]<=mem[physical_addr[dpp]+8*i]; 
d_reg[8:15]<=mem[physical_addr[dpp]+8*i+l]; 
d_reg[16:23]<=mem[physical_addr[dpp]+8*i+2] 
d_reg[24:31]<=mem[physical_addr[dpp]+8*i+3] 
d_reg[32:39]<=mem[physical_addr[dpp]+8*i+4] 
d_reg[40:47]<=mem[physical_addr[dpp]+8*i+5] 
d_reg[48:55]<=mem[physical_addr[dpp]+8*i+6] 
d_reg[56:63]<=mem[physical_addr[dpp]+8*i+7] 
if(Delay2!=0) 
begin 
ta_reg_<=#13  'bz; 
d_reg<=#13  64'bz; 
end 
end 
else 
index  <=  5; 
end 

@(posedge  elk) 
ta_reg_  <=  #7  'bz; 
d_reg  <=  #7  64 'bz; 
Transfer_size[dpp]  <=  'bz; 
Transfer_type[dpp]  <=  none; 
abort  <=  FALSE; 
end 

bnrst_write: 
begin 
//burst-writes  are  always  performed  in  order 
if(Delayl-Delay2>=0) 

repeat(Delayl-Delay2)@(posedgeclk); 
for  (index=0;  index<4;  index=index+l) 
begin 
repeat(Delay2)@(posedge  elk); 
#7  ta_reg_  <=  low; 
i  =  index; 

@(posedge  elk)  //latch  data 
data  =  D; 

mem[physical_addr[dpp]+8*i]    <=  data[  0:  7]; 
mem[physical_addr[dpp]+8*i+l]  <=  data[  8:15]; 
mem[physical_addr[dpp]+8*i+2]  <=  datal  16:231; 
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mem[physicaJ_addr[dpp]+8*i+3]  <=  data[24:31] 
mem[physical_addr[dpp]+8*i+4]  <=  data[32:39] 
mem[physical_addr[dpp]+8*i+5]  <=  data[40:47] 
mem[physicaJ_addr[dpp]+8*i+6]  <=  data[48:55] 
mem[physical_addr[dpp]+8*i+7]  <=  data[56:63] 
if  (Delay2!=0) 

ta_reg_  <=  #7  'bz; 
end 
ta_reg_  <=  #7  'bz; 
data  <=  #7  64'bz; 
Transfer_size[dpp]  <=  'bz; 
Transfer_type[dpp]  <=  none; 
@(posedge  elk); 
end 


default:  $display("Memory  module  received  bad  TT[%d]  =  %b",dpp, 
Transfer_type[dpp],"  at  time  %d",  $time); 
endcase 
end 

endmodule 


95 


96 


APPENDIX  C.  PRC  BEHAVIOR  FILES 


The  files  in  this  appendix  are  the  result  of  the 
behavioral  design  phase.  They  include  the  verilog  behavioral 
models  of  the  PRC  and  the  testing  results.  The  files  are 
located  on  the  Computer  Center  system  at  joshua_u2 /jrrobert / 
thesis/veri 1 og/behavi or . 

A.    PRC 


*  Predictive  Read  Cache 

*  Filename:  prc.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     02OCT95 

*  Revised:  10JAN96 

* 

*  Purpose:  This  module  emulates  die  predictive  read  cache. 

* 

*  *  *  *  #  *  *  *  *  *  #  *  *  *  #  *  *  :fc  *  #  :fc  #  *  *  *  ^c  *  3f:  *  #  ^e  :fc  :fc  :f:  ;fc  ^c  ^c  ^  :je  *  *  *  *  #  *  :fi  :f:  *  *  *  *  *  *  *  *  :fc  *  'V-  #  :fc  *  *  :J*  *  #  :l-  :fc  :i=  *  #  :£  #  "K  =!c  :Jc  :Jt  :J:  :fr  / 

module  prc(CPU_BR_,BR_,BG_,ABB_,TS_.A,AP,APE_.TT,TSIZ,TC,TBST_.AACK_, 
DBG_,DBB_,D,DP,DPE_.TA_,HRESET_,CANX,clk); 

//  Signals  are  defined  in  system.v.  Notations  follow  conventions  used  in 

//    PowerPC  Users  Manual. 

input  CPU_BR_,BG_,AACK_.DBG_.TA_,HRESET_,clk: 

output  [0:1]  TC; 

output  BR_,APE_,DPE_,CANX; 

mout  [0:31]  A; 

inout  [0:63]  D; 

inout  [0:7]  DP; 

mout  [0:4]  TT; 

mout  [0:3]  AP; 

inout  [0:2]TSIZ; 

mout  ABB_,TS_.TBST_.DBB_; 

wire  [0:11  TC; 

wire  BR_,APE_,DPE_.CANX; 

wire  [0:31]  A; 

wire  [0:63]  D; 

wire  [0:7]  DP; 
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wire  [0:4]  TT; 

wire  [0:3]  AP; 

wire  |():2|TSIZ: 

wire  ABB_.TS_.TBST_.DBB_; 

//declare  v;u-iables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =  l'bO, 
hi    =  l'bl, 
low   =  L'bO; 

//Other  internal  control  signals 

wire  CARJatch,  predict,snoop_ignore; 

wire  [0:255]  DATALINE; 

wire  [0:26]  CAR;  //current  address  register 

wire  [0:26]  NAR;  //  next  address  register 

wire  [0:26]  MRMA;  //most  recent  memory  access 

wire  [0:6]  ActiveLine; 

wire  [0:1]  BURSTSTART; 

//Connect  parts 
bus_interfaceBIUl(NAR,BURSTSTART.BG_.CPU_BR_.AACK_,DBG_, 

send,fetch,clk.BR_,upload,download,fetch_done, 

send_done,CANX,snoop_ignore. 

DATALENE,D.A.DP.DPE_.TT.TSIZ.ABB_.TS_.TBST_.DBB_.TA_.HRESET_); 

snooperSNPI(A.AP.TT.TC.TS_.snoop_ignore.hold.clk.CAR.BURSTSTART.re;Hl.wriie); 
controller  CONK HRESET_. read. writeJiit.send_done.reteh_done. 

1 1  ne_empty,a_select,test. predict. store, 

riush,send,hold,new_replace,retch,clk); 
predictor  PREl(MRMA,CAR.predict.NAR); 
line_mgr  LM 1  (CAR,N  AR.HRESET_,a_seIecUest,fetch_done,r!ush.slore. 

new_replace,MRMA,ActiveLinedine_empty.hil); 
datalist  DL 1  (DATALINE,ActiveLine,upload,download): 

endmodule 


B .  CONTROLLER 


/*  *********************************  *  *  ********  *  *  *  **************  *  *  *  *  *  *********  *  *  * 

*  CONTROLLER 

*  Filename:  controller.v 

*  Autlior:  Joseph  R.  Robert,  Jr. 

*  Date:     21  DEC 95 

*  Revised:  05JAN96 
* 

*  Purpose:  This  module  is  a  Finite  State  Machine  which  coordinates  the  actions 

*  of  all  the  other  functional  blocks  of  the  PRC.  All  control  signals  are 
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*  synchronous  with  the  system  clock.  HRESET_  causes  the  Controller  to  go  to 

*  tlie  IDLE  state.  See  slate  diagram  and  slate  output  tables. 

H=  =^  ^=  K«  *  H=  *  *  =^  *  *  *  He  *  *  =^  *  *  =K  :f=  =*=  =t=  =f=  =f=  ;(=  =t=  -fc  Jf=  rft  rfc  =(c  ;fc  :fs  =fc  ifc  =tc  rfc  =(=  :fc  :fc  =f=  *  *  =J=  =k  =+=  -|i  *  =f=  =f=  =fc  =f=  =t=  =f=  *  =K  =f=  =f=  =f=  =H  ^fr  -"f=  *  =f=  ^(=  =fi  =H  =f=  =!=  =fr  =h  =rr  -!=  =i=  =^  ;;=  •-!=  ".:t  / 

module  controller  (HRESET_.read,write,hil,send_done,l'eteh_done. 
line_emply,a_select, test, predict, store. 
riush,send,hold,new_replaee,t'etch,clk); 

input  HRESET_.read.write,hil.send_do!ie,fetch_donc,line_empty,clk; 
output  a_selecl. test, predict. slore,riush,send.hokl,new_re|ilace, letch; 

rcg  a_sclccl.lesl,predicl.slorc,riush,send,hold.ncw_rcplaceJclch; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  I'M, 
FALSE  =  1'bO, 
hi     =l'bl, 
low    =  1'bO, 
trace  =  FALSE; 

//Finite  State  Machine  variable  and  parameters 
reg  [0:3]  state,  next_state; 
reg  [0:2]  inputs3; 
reg  [0:1  |  inputs2; 
reg  input  1 ; 
parameter  idle   =  0, 

iest_car_r  =  1 . 

send_dala  =  2, 

test_nar  =  3, 

fetch_data  =  4, 

is_line_empty  =  5, 

predict_na  =  6, 

store_car  =  7, 

lcsl_car_w  =  X, 

Hushjine  =  9; 

//initialize  signals 
initial 
begin 
state  <=  idle;  //The  state  variables  must  be  initialized  to 

next_state  <=  idle;      //avoid  the  default  error  message, 
end 

//FINITE  STATE  MACHINE 
always  @(negedge  HRESETJ 
begin 
stale  <=  idle; 
nexi_siale  <=  idle; 
wait(HRESET_==hi): 
end 
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ilways 

begin 

#2  stale  = 

nexi_siate: 

if  ((race) 

Sdisplay( 

"Controller  entered  st 

ate  %d.' 

'.stale) 

#1  case  (slate) 

idle:  //() 

begin 

//a_seli 

id     <=  low; 

lesl 

<=  low; 

predict 

<=  low; 

store 

<=  low; 

Hush 

<=  low; 

send 

<=  low; 

hold 

<=  low; 

new_replace  <=  low; 

fetch        <=  low; 

@(posedge  elk)  inputs2  =  (read, write |; 

if(HRESET_  =  Iow) 

nexl_state  =  idle; 
else 

case  (inputs2) 

2'h()():  next_slale  =  idle; 

2'b()l:  nexl_slalc  =  lesl_car_w; 

2'hl():  next_stale  =  lesl_car_r; 

2'bl  I:  next_slale  =  tesi_cai_w;  //This  should  not  happen. 
endcase 
end 

icst_c;ir_r:  //l 
begin 

a_select     <=  low;  //CAR 

lest         <=  hi; 

predict      <=  low; 

store       <=  low; 

flush        <=  low; 

send         <=  low; 

hold         <=  hi; 

new_replace  <=  low; 

fetch        <=  low; 

@(posedge  elk)  input  1  =  hit; 

case  ( input  1) 
I'bO:  next_state  =  is_line_emply; 
1  'h  I :  next_state  =  send_data; 

endcase 
end 

send_data:  //2 

bc'in 
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//a_select     <=  low; 

test        <=  low; 

predict     <=  hi; 

store       <=  low; 

flush       <=  low; 

send        <=  hi; 

hold        <=  hi; 

new_replace  <=  low; 

fetch       <=  low; 

@(posedge  elk)  input  1  =  send_done; 

case  (input  1) 

1  'bO:  next_state  =  send_data; 

l'bl:  next_state  =  test_nar; 
endcase 
end 

test_nar:  //3 
begin 
a_select     <=hi;  //NAR 
test        <=  hi; 
predict      <=  low; 
store        <=  low; 
flush       <=  low; 
send        <=  low; 
hold        <=  hi; 
new_replace  <=  low; 
fetch       <=  low; 

@(posedge  elk)  inputs3  =  {hitjead write}; 
case  (inputs3) 

3'bOOO:  next_state  =  fetch_data; 

313001:  next_state  =  idle; 

3Td010:  next_state  =  idle; 

3'bOl  1:  next_state  =  idle;  //This  should  not  happen. 

3'bl00:  next_state  =  idle; 

3'bl01:  next_state  =  idle; 

3'bllO:  next_state  =  idle; 

3'bl  1 1:  next_state  =  idle;  //This  should  not  happen, 
endcase 
end 

fetch_data:  //4 
begin 
a_select     <=hi;  //NAR 
test        <=  low; 
predict      <=  low; 
store       <=  low; 
flush       <=  low; 
send        <=  low; 
hold        <=  hi; 
new_replace  <=  low; 
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fetch       <=  hi; 

@(posedge  elk)  input  1  =  fetch_done; 

case  (inputl) 

1'bO:  next_state  =  fetch_data; 

l'bl:  next_state  =  idle; 
endcase 
end 

is_line_empty:  //5 
begin 

//a_select     <=  low; 

test        <=  low; 

predict      <=  low; 

store       <=  low; 

flush       <=  low; 

send        <=  low; 

hold        <=  hi; 

new_replace  <=  low; 

fetch       <=  low; 

@(posedge  elk)  inputl  =  line_empty; 

case  (inputl) 
1'bO:  next_state  =  predict_na; 
l'bl:  next_state  =  store_car; 

endcase 
end 

predict_na:  //6 
begin 

//a_select    <=  low; 

test        <=  low; 

predict      <=  hi; 

store        <=  low; 

flush       <=  low; 

send        <=  low; 

hold        <=  hi; 

new_replace  <=  hi; 

fetch        <=  low; 

@(posedge  elk)  next_state  =  test_nar; 
end 

store_car:  //7 
begin 
a_select    <=low;  //CAR 
test        <=  low; 
predict      <=  low; 
store        <=hi; 
flush       <=  low; 
send         <=  low; 
hold        <=  hi; 
new_replace  <=  low; 
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fetch       <=  low; 
@(posedge  elk)  next_state  =  idle; 
end 

test_car_w:  //8 
begin 
a_select     <=  low;  //CAR 
test        <=  hi; 
predict     <=  low; 
store       <=  low; 
flush       <=  low; 
send        <=  low; 
hold        <=  hi; 
new_replace  <=  low; 
fetch       <=  low; 
@(posedge  elk)  input  1  =  hit; 
case  (input  1) 

1  'bO:  next_state  =  idle; 

l'bl:  next_state  =  flush_line; 
endcase 
end 

flushjine:  //9 
begin 

//a_select     <=  low; 
test        <=  low; 
predict     <=  low; 
store       <=  low; 
flush       <=  hi; 
send        <=  low; 
hold        <=  hi; 
new_replace  <=  low; 
fetch       <=  low; 
@(posedge  elk)  next_state  =  idle; 
end 

default: 
begin 

$display("state  error  in  module  controller."); 

$display("    state  =  %b.",state); 
end 

endcase 
end 

endmodule 


103 


C .  SNOOPER 


*  SNOOPER 

*  Filename:  snooper.v 

*  Author:  Joseph  R.  Robert  Jr. 

*  Date:     21DEC95 

*  Revised:  05JAN96 

*  Purpose:  This  module  watches  the  system  bus  activity,  and  makes  appropriate 

*  reports  to  the  PRC  Controller. 

*  If  the  transaction  is  a  data  burst  read  or  any  land  of  write,  and  if  the 

*  address  parity  is  correct,  then  the  read  or  write  signal  is  asserted  as 

*  appropriate,  and  the  address  is  placed  in  the  CAR.  The  snoop_ignore  signal 

*  tells  this  unit  to  ignore  the  current  transaction,  because  it  was  undated 

*  by  the  Bus  Interface  Unit.  The  snoop_ignore  signal  must  be  asserted 

*  concurrendy  with  the  transfer  attributes. 

*  Reads  that  are  not  burst  or  data  related  are  ignored  by  the  PRC.  The  CAR 

*  is  updated  only  on  transacdons  relevant  to  the  PRC. 

*  Due  to  the  two-stage  pipelining  capability  of  the  PowerPC,  with  respect  to 

*  memory  accesses,  a  second  address  tenure  can  occur  shortly  after  the  first, 

*  well  before  the  first  data  tenure  is  complete.  To  compensate  for  this,  the 

*  read  and  write  outputs  of  the  Snooper  will  remain  exerted  undl  acknowledged 

*  by  the  Controller  with  hold.  The  rising  edge  of  hold  indicates  that  the  read 

*  or  write  signal  was  received  by  the  Controller.  The  Snooper  can  then  negate 

*  these  signals,  but  must  leave  CAR  alone  undl  hold  is  negated.  After  hold  is 

*  negated,  CAR  can  be  updated  to  the  new  address. 

* 

module  snooper  (A,AP,TT,TC,TS_,snoop_ignore,hold,clk,CAR,BURSTSTART, 
read_flag,write_flag); 

input  [0:31]  A; 

input  [0:3]  AP; 

input  [0:4]  TT; 

input  [0:1]  TC; 

input  TS_,snoop_ignore,hold,clk; 

output  [0:26]  CAR; 

output  [0:1]  BURSTSTART; 

output  read_flag,write_flag; 

reg  [0:26]  CAR; 

reg  [0:1]  BURSTSTART; 

reg  read_flag,write_flag; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =l'b0, 
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hi    =  lbl, 

low   =  1'bO; 

//Address  related 

reg  [0:31]  address; 

reg  [0:3]  addr_parity,addr_parity_calc; 

//Other  external  control  signals 
reg  [0:4]  Transfer_type; 
parameter  //for  Transfer_type 
none         =  5'bz, 
write        =  5'bOOOlO,         //02 
write_atomic  =  5'bl0010,   //12 
read         =  5'bOlOlO,         //OA 
read_atomic  =  5'bllOlO,    //1A 
burst_write  =5'b00110,     //06 
burst.read    =  5'bOlllO,     //OE 
burst_read_atomic  =  5'bl  1 1 10;    //IE 
reg  [0:1]  Transfer_code; 
parameter  //for  Transfer_code 
data_transfer     =  2'b00, 
touch_load        =2'b01, 
instrucdon_fetch  =  2'blO, 
reserved  =  2'bll; 

reg  ignore; 

//Other  internal  control  signals 

reg  valid_read_0,  valid_read_l;    //The  numbers  indicate  the  pipeline  stage. 

reg  valid_write_0,  valid_write_l; 

tri  parity_valid; 

reg  Transacdon_waiting; 

//initialize  variables 
inidal 
begin 

CAR  <=  27"bz; 

BURSTSTART  <=  2T?z; 

read_flag  <=  low; 

write_flag  <=  low; 

address  <=  32'bz; 

addr_parity  <=  4'bz; 

addr_panty_calc  <=  4'bz; 

Transfer_type  <=  none; 

Transfer_code  <=  none; 

ignore  <=  low; 

Transacdon_waiting  <=  low; 
end 

//BEHAVIOR 
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//Calculate  address  parity, 
always  @  (address) 
begin 
addr_parity_calc[0]  <=  -^address  [0:7]; 
addr_parity_calc[l]  <=  ~Aaddress[8:15]; 
addr_parity_calc[2]  <=  -^addressUG^S]; 
addr_parity_calc[3]  =  ~Aaddress[24:31]; 
end 

assign  parity_valid  =  (addr_parity_calc  ==  addr_parity); 

//If  there  is  a  transaction, 

//  and  that  transaction  is  a  data  burst  read  or  any  kind  of  write 

//  and  the  transaction  is  not  initiated  by  the  PRC  itself, 

//  and  if  the  address  parity  is  correct 

//then  report  the  type  of  transaction  to  the  Controller. 

always  @(posedge  elk) 
begin 
if  (TS_=low) 
begin  //latch  address  and  attributes  in  stage  0. 
address  <=  A; 
Transfer_type  <=  TT; 
Transfer_code  <=  TC; 
ignore  <=  snoop_ignore; 
addr_parity  =  AP; 

#2  valid_read_0  =  Transfer_code  =  data_transfer  & 
(Transferjype  =  burst_read  I 
Transfer_type  =  burst_read_atomic); 
valid_write_0  =  Transfer_type  =  write  I 
Transfer_type  =  write_atomic  I 
Transfer_type  =  burst_write; 
#4  if  (!  ignore  &  parity  jvalid  &  (valid_read_0  I  valid_write_0)) 
Transaction_waiting  =  hi; 
end 
end 

always  @(posedge  hold) 
begin 

read_flag  <=  low; 

write_flag  <=  low; 
end 


always 
begin 
wait(Transaction_waiting); 
valid_read_l  =  valid_read_0; 
valid_write_l  =  valid_write_0; 
Transaction_waiting  =  #2  low; 
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wait(!hold); 
if  (valid_read_l) 
begin 

read_flag  <=  hi; 
CAR  =  address[0:26]; 
BURSTSTART  =  address  [27:28]; 
end 
else  if  (valid_write_l) 
begin 
write_flag  <=  hi; 
CAR  =  address[0:26]; 
BURSTSTART  =  address[27:28]; 
end 
end 

endmodule 


D.    LINE  MANAGER 


*  LINE  MANAGER 

*  Filename:  line_mgr.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     21DEC95 

*  Revised:  05JAN95 

* 

*  Purpose:  This  module  contains  the  address  list,  status  flags  for  each  line 

*  (Valid,  Aged),  a  general  status  flag  (line_empty),  the  line  replacement  unit, 

*  and  a  couple  of  pointers  (AcdveLine,  ReplaceLine). 

*  The  MRMA  output  is  always  the  MRMA  of  the  AcdveLine.  The  line_empty 

*  flag  indicates  that  the  currendy  acdve  line  has  no  addresses  in  it  yet,  and 

*  therefore,  cannot  be  used  by  the  PRC  to  make  a  prediction. 

*  The  input  a_select  determines  which  address  input  is  used  for  a  particular 

*  operation.  The  two  address  inputs  are  the  CAR  and  the  NAR. 

*  When  the  Line  Manager  receives  a  test  signal,  it  compares  the  input  address 

*  with  the  contents  of  the  PredMA  List.  If  there  is  a  match  with  the  CAR,  it 

*  asserts  die  hit  signal,  and  changes  the  ActiveLuie  pointer  to  the  line  number 

*  of  the  match. 

*  If  there  is  a  miss  with  the  CAR,  then  the  AcdveLine  switches  to  the  same 

*  line  pointed  to  by  ReplaceLine. 

*  If,  during  a  test,  there  is  a  match  with  the  NAR,  hit  is  asserted,  and  the 

*  value  in  AcdveLine  is  irrelevant  since  it  will  not  be  used.  If  there  is  a 

*  miss  with  the  NAR,  the  AcdveLine  must  remain  unchanged  from  the  test. 

*  The  fetch_done  signal  from  the  Bus  Interface  Unit  causes  the  NAR  to  be 

*  stored  in  PredMAjActiveLine],  the  CAR  to  be  stored  in  MRMA[AcdveLine],  the 

*  Valid  flag  to  be  set,  and  the  Aged  flag  to  be  reset. 

*  The  flush  signal  causes  the  current  AcdveLine  to  become  invalid  by  setting 


107 


*  Valid  [Act  iveLine]  =  0. 

*  The  store  signal  causes  the  input  address  to  be  stored  into  the  MRMA  of  the 

*  AcdveLine.  This  is  only  used  for  the  first  address  in  a  new  line.  Store 

*  also  causes  the  line_empty  flag  to  be  reset. 

* 

module  line_mgr  (CAR.NAR,HRESET_,a_select,test.fetch_done,flush,store, 
new_replace,MRMA_out,ActJveLine,line_empty,hit); 

input  [0:26]  CAR,NAR; 

input  HRESET_,a_select,test,fetch_done,flush,store,new_replace; 

output  [0:26]  MRMA_out; 

output  [0:6]  AcdveLine; 

output  line_empty,hit; 

reg  [0:26]  MRMA_out; 
reg  [0:6]  AcdveLine; 
reg  line_empty,hit; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =  1'bO, 
hi    =  l'bl, 
low   =  1'bO; 

//Address  related 
reg  [0:26]  in_addr; 

//Data  structure 
reg  [0:26]  PredMA  [0:127], 
MRMA  [0:127], 
PredMA_reg,MRMA_reg; 
reg  Valid  [0:127], 
Aged  [0:127]; 

//Other  internal  control  signals 

reg  [0:7]  il,i2,i3; 

reg  [0:6]  ReplaceLine; 

reg  match,temp,alMines_are_valid,done; 

//inidalize  variables 
initial 
begin 
for  (il=();  il<=127:  il=il+l) 
begin 
PredMA[il]<=27'bO; 
MRMA[il]    <=27'b0; 
end 
end 
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//BEHAVIOR 

always  @(negedge  HRESETJ 
begin 
for(il=0;il<=127;il=il+l) 
begin 
Valid[il]<=low; 
Aged[il]  <=low; 
end 

AcdveLine  <=0; 
ReplaceLine  <=  0; 
hne_empty  <=hi; 
wait(HRESET_  ==  hi); 
end 

always  @(a_select  or  CAR  or  NAR)  //address  multiplexer 
begin 
if  (a_select=0) 
in_addr  =  CAR; 
else 
in_addr  =  NAR; 
end 

always  @(ActiveLine) 
begin 

MRMA_out  =  MRMA[ActiveLine]; 

$display("Line_mgr  selected  new  ActiveLine  =  %d  at  Sd",AcdveLine,$time); 
end 

always  @(posedge  test) 
begin 
hit   =  low; 
match  =  low; 
#2  i2  =  0; 

while  (!match&i2<  128) 
if  (PredMA[i2]  =  in_addr  &  Valid[i2]) 
match  =  hi; 
else 
i2  =  i2+  1; 
#2  if  (match  &  a_select=0)  //a  match  with  the  CAR 
begin 
hit  <=  hi; 
AcdveLine  <=  i2; 
end 
else  if  (match  &  a_select=l)  //  a  match  with  the  NAR 

hit  <=  hi; 
else  if  (Imatch  &  a_select=0)  //a  miss  with  the  CAR 

AcdveLine  <=  ReplaceLine; 
else  if  (Imatch  &  a_select=l)  //a  miss  with  the  NAR 
begin  end//  Do  nothing. 
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end 

always  @(posedge  fetch_done) 
begin 

MRMA[ActiveLine]  <=  CAR; 

MRMA_out  <=  CAR; 

PredMA[ActiveLine]  <=  NAR; 

Valid [ActiveLine]  <=  hi; 

Aged[ActiveLine]  =  low; 
end 

always  @(posedge  flush) 
begin 
Valid[ActiveLine]  =  low; 

$display("Line  manager  flushed  line  %d  at  time  %d.",ActiveLine,$time); 
end 

always  @(posedge  store) 
begin 

MRMA[ActiveLine]  =  in_addr; 

MRMA_out  =  MRMA  [ActiveLine]; 

line_empty  =  0; 
end 

*  LINE  REPLACEMENT  UNIT 

* 

*  ReplaceLine  always  points  to  the  line  to  be  replaced  at  the  next  PRC  miss. 

*  As  soon  as  the  PRC  starts  predicting  the  first  address  for  a  line  it 

*  asserts  new_replace,  and  the  Line  Replacement  Unit  can  men  find  a  new  line 

*  to  mark  as  the  next  ReplaceLine.  It  searches  sequentially  for  the  next  line 

*  with  invalid  data  and  marks  that  line  as  the  next  to  be  replaced.  If  all 

*  lines  contain  valid  data,  men  it  scans  for  the  next  line  that  is  "aged", 

*  indicated  by  a  set  Aged  flag.  As  it  scans  for  an  aged  line,  it  sets  the  Aged 

*  bits  in  the  lines  it  passes.  Therefore,  as  it  wraps  around  in  search  of  an 

*  aged  line,  it  will  eventually  come  upon  one,  even  if  none  were  aged  when  the 

*  search  began. 

*  All  of  this  occurs  while  the  PRC  is  fetching  data,  so  it  has  several  clock 

*  periods  in  which  to  complete  the  search. 

always 
begin 
temp  =  TRUE; 
for  (i3=0;  i3<=127;  i3=i3+l) 
if(!Valid[i3]) 
temp  =  FALSE; 
#1  all_lines_are_valid  =  temp; 
end 
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always  @(posedge  new_replace)  //  find  the  next  ReplaceLine 
begin 

done  =  FALSE; 
#2  while  (!done) 
begin 
ReplaceLine  =  ReplaceLine  +  1;  //mod  128  addition 
if(!Valid[ReplaceLine]) 
done  =  TRUE; 
else  if  (all_lines_are_valid  &  Aged  [ReplaceLine]) 

done  =  TRUE; 
else 

Aged  [ReplaceLine]  =  1; 
end 
line_empty  =  hi; 
end 

endmodule 


E .  PREDICTOR 


*  PREDICTOR 

*  Filename:  predictor.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     21DEC95 

*  Revised:  05JAN96 

* 

*  Purpose:  This  module  calculates  the  Next  Address  (stored  in  NAR)  based  on  the 

*  Most  Recent  Memory  Access  (MRMA)  and  the  Current  Address  (in  the  CAR).  The 

*  prediction  calculation  is 

*  NAR  =  2*CAR  -  MRMA 

*  The  calculadon  is  undated  upon  each  rising  edge  of  the  predict  signal. 

*  The  output  NAR  remains  latched  and  valid  undl  the  next  predict  leading  edge. 

* 

module  predictor  (MRMA,CAR,predict,NAR); 

input  [0:26]  MRMA.CAR; 
input  predict; 
output  [0:26]  NAR; 

reg  [0:26]  NAR; 

parameter  TRUE  =  l'bl, 
FALSE  =l'b0, 
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trace  =  FALSE; 

//  behavior 

always  @(posedge  predict) 
begin 
NAR  =  2*CAR  -  MRMA; 
if  (trace) 
begin 
$display( "Predictor:  NAR      =  2*CAR       -  MRMA"); 

$display("  %h  =  2*%h  -  %h",{NAR,5'bO},{CAR,5'bO},{MRMA,5'bO}); 

end 
end 

endmodule 


F.  DATA    LIST 


*  DATA  LIST 

*  Filename:  datalist.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     15DEC95 

*  Revised:  05JAN96 

* 


*  Purpose:  This  module  emulates  the  PRC's  Data  List. 

* 

*  An  upload  signal  causes  the  Data  List  to  store  the  data  on  datajine  into 

*  the  address  specified  by  AcuveLine. 

*  A  download  signal  causes  the  Data  List  to  assert  onto  datajine  the  data  in 

*  the  address  specified  by  AcuveLine. 

module  datalist  (dataJine,AcuveLine,upload,download); 

input  [0:6]  AcuveLine; 
input  upload,download; 
inout  [0:255]  datajine; 

tri  [0:255]  datajine; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  I'M, 
FALSE  =l'b0, 
hi     =  l'bl, 
low    =  1'bO, 
trace  =  TRUE; 
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//Data  structure 
reg  [0:255]  line  [0:127], 
line_reg, 
data_line_reg; 
assign  data_line  =  data_line_reg; 

//initialize  signals 
initial 

begin 
data_line_reg  <=  256'bz; 

end 

//BEHAVIOR 
always  @(posedge  upload) 
begin 
line_reg  =  datajine; 
line[ActiveLine]  =  line_reg; 
if  (trace)  begin 
$display("DATALIST  uploaded  this  data  into  line  %h  at  time  %d.", 

Acti  veLine,$time) ; 
$display("      %h",line_reg); 
end 
end 

always  @(posedge  download) 
begin 
line_reg  =  line[ActiveLine]; 
data_line_reg  =  line_reg; 
if  (trace)  begin 
$display("DATALIST  downloaded  this  data  from  line  %h  at  time  %d. 

ActiveLine,$time); 
$display("      %h",line_reg); 
end 
end 

always  @(negedge  download) 
begin 

data_line_reg  =  256'bz; 
end 

endmodule 


113 


BUS  INTERFACE  UNIT 


*  BUS  INTERFACE  UNIT 

*  Filename:  bus_interface.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     09OCT95 

*  Revised:  05JAN96 

*  Purpose:  This  module  connects  the  PRC  with  the  system  bus.  It  handles 

*  the  protocol  of  data  transfer  in  and  out  of  the  PRC. 

*  When  this  module  received  a  fetch  signal,  it  latches  the  address  in  the 

*  NAR,  and  requests  the  bus  for  a  burst  read.  It  stores  the  incoming  data 

*  until  all  four  bursts  have  been  received.  Then  it  uploads  the  data  into  the 

*  Data  List  and  assserts  fetch_complete. 

*  When  this  module  receives  a  send  signal,  it  sends  a  cancel  signal  (CANX)  to 

*  the  memory  module,  downloads  data  from  the  Data  List,  and  then  sends  the  data 

*  to  the  CPU.  When  the  transfer  is  finished,  it  asserts  send_done. 

*  The  coordination  of  these  activities  is  accomplished  through  the  use  of  a 

*  Finite  State  Machine. 

module  bus_interface  (NAR_IN,BURSTSTART,BG_,CPU_BR_,AACK_,DBG_, 
send,fetch,clk,BR_,upload,download,fetch_done, 
send_done,CANX,snoop_ignore, 
DATALINE,D,A,DP.DPE_,TT,TSIZ,ABB_,TS_,TBST_,DBB_,TA_,HRESET_); 

//  Signals  are  defined  in  system.v. 

input  [0:26]  NAR_IN; 

input  [0:1]  BURSTSTART; 

input  BG_,CPU_BR_,AACK_,DBG_,send,fetch,clk,HRESET_; 

output  BR_,upload,download,fetch_done; 

output  send_done,DPE_,CANX,snoop_ignore; 

inout  [0:255]  DATALINE; 

inout  [0:63]  D; 

inout  [0:31]  A; 

inout  [0:7]  DP; 

inout  [0:4]  TT; 

inout  [0:2]  TSIZ; 

inout  ABB_,TS_.TBST_,DBB_,TA_; 

reg  BR_,upload,download,fetch_done,send_done,CANX,snoop_ignore; 

tri  [0:255]  DATALINE; 

tri  [0:63]  D; 

tri  [0:31]  A 

tri  [0:71  DP 

tri  [0:4]  TT: 

tri  [0:3]  AP 
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tri  [0:2]  TSIZ; 

tri  ABB_,TS_,TBST_,DBB_,TA_,DPE_; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =l'b0, 
hi    =  l'bl, 
low   =  1'bO, 
trace  =  TRUE; 

//Address  related 
reg  [0:31]  NAR; 
reg  [0:31]  a_reg; 

assign  A  =  a_reg; 
reg  [0:3]  ap_reg,  addr_parity_calc; 

assign  AP  =  ap_reg; 
reg  [0:1]  burst_start; 

//Data  related 

reg  [0:255]  data_line_reg,  datajine; 

assign  DATALINE  =  data_line_reg; 
reg  [0:63]  d_reg,data_reg; 

assign  D  =  d_reg; 
reg  [0:7]  dp_reg,  data_panty_caJc,  data_parity_in; 

assign  DP  =  dp_reg; 

//Other  external  control  signals 
reg  [0:4]  tt_reg,Transfer_type; 
assign  TT  =  tt_reg; 
parameter  //for  Transfer_type 
none         =  5'bz, 
burst_write  =51)00110,     //06 
burst_read    =5'b01110,     //0E 
burst_read_atomic  =  5'bl  1 1 10;    //IE 
reg  [0:2]  tsiz_reg; 

assign  TSIZ  =  tsiz_reg; 
reg  abb_reg_,dbb_reg_,ts_reg_,tbst_reg_,ta_reg_; 
assign  ABB_  =  abb_reg_; 
assign  DBB_  =  dbb_reg_; 
assign  TS_  =  ts_reg_; 
assign  TBST_  =  tbst_reg_; 
assign  TA_   =  ta_reg_; 

//Other  internal  control  signals 

reg  [0:2]  i;  //counter 

reg  [0:1]  j; //counter 

wire  qual_BG_,qual_DBG_; 

reg  AB_Master,Transfer_in_progress,Transfer_start,Addr_termination, 

Data_Parity_Error; 

assign  DPE_  =  ~Data_Parity_Error; 
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event  transfer_acknowledged,start_send; 

//Finite  State  Machine  variable  and  parameters 
reg  [0:3]  state,  next_state; 
reg  [0: 1  ]  inputs2; 
reg  input  1; 
parameter  idle   =  0, 

fetch  1  =  1, 

fetch2  =  2, 

fetch3  =  3, 

sendl  =5, 

send2  =6; 

//initialize  signals 
initial 
begin 

BR_  <=  hi; 

upload  <=  low; 

download  <=  low; 

fetch_done  <=  low; 

CANX  <=  low; 

NAR  <=  32'bz; 

a_reg  <=  32'bz; 

ap_reg  <=  4'bz; 

addr_parity_calc  <=  4'bz; 

burst_start  <=  2'bz; 

data_line_reg  <=  256'bz; 

datajine  <=  256'bz; 

d_reg  <=  64 'bz; 

data_reg  <=  64 'bz; 

dp_reg  <=  8'bz; 

data_parity_calc  <=  8'bz; 

data_parity_in  <=  8'bz; 

tt_reg  <=  5'bz; 

tsiz_reg  <=  3'bz; 

abb_reg_  <=  'bz; 

dbb_reg_  <=  'bz; 

ts_reg_  <=  'bz; 

tbst_reg_  <=  'bz; 

ta_reg_  <=  'bz; 

i  <=  3'bz; 

j  <=  2'bz; 

AB_Master  <=  low; 

Transfer_in_progress  <=  low; 

Transfer_start  <=  low; 

Addr_termination  <=  low; 

Data_Parity_Error  <=  low; 

send_done  <=  low; 

snoop_ignore  <=  low; 

state  <=  0; 
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next_state  <=  0; 
inputs2  <=  2'bz; 
input  1  <=  'bz; 
end 

//ADDRESS  BUS  ARBITRATION 

assign  qual_BG_  =  ~(!BG_  &  ABB  J; 

//Assume  mastership 
always  @(posedge  elk) 
if  (qual_BG_  =  low) 
begin 
abb_reg_  =  #2  low; 
AB_Master  =  TRUE; 
BR_  <=  #1  hi; 
end 

//Calculate  address  parity, 
always  @(NAR) 
begin 
addr_parity_calc[0]  <=  ~ANAR[0:7]; 
addr_parity_calc[l]  <=  ~ANAR[8:15]; 
addr_parity_calc[2]  <=  ~ANAR[16:23]; 
addr_parity_calc[3]  =  ~ANAR[24:31]; 
end 

//Transfer  address 
always  @(posedge  elk) 
if  (qual_BG_  =  low) 
begin 
ts_reg_  =  #7  low; 
Transfer_start  <=  TRUE; 
a_reg  <=  NAR; 
ap_reg  <=  addr_parity_calc; 
tt_reg  <=  burst_read; 
tsiz_reg  <=  3'b010; 
tbst_reg_  <=  low; 
snoop_ignore  <=  hi; 
if  (trace) 

$display("BIU  started  read  from  address  %h  at  time  %d. 
NAR,$time); 
end 

always  @(posedge  elk) 
if  (AB_Master  &  TS_=low) 
begin 
ts_reg_  =  #7  hj; 
wait  (AACK_==low); 
Addr_termination  =  TRUE; 
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end 


//Address  termination 
always  @(posedge  elk) 
if  (Addr_termination) 
begin 

#7  ts_reg_  <=  'bz; 

a_reg       <=  'bz; 

ap_reg      <=  'bz; 

tt_reg      <=  'bz; 

tsiz_reg    <=  'bz; 

tbst_reg_  <=  'bz; 

snoop_ignore  <=  low; 

//insert  other  addr  transfer  characteristics  here. 

abb_reg_<=#2  hi; 

abb_reg_  <=  #8  'bz; 

AB_Master  =  FALSE; 

Addr_termination  =  FALSE; 
end 

//DATA  BUS  ARBITRATION  FOR  FETCHES 

assign  qual_DBG_  =  ~(!DBG_  &  DBBJ; 

always  @(posedge  elk) 
begin 
if(TA_  =  low) 
->  transfer_acknowledged; 
end 


//calculate  data  parity.  Odd  parity,  including  parity  bit. 

always  @(data_reg) 
begin 
data_parity_calc[0]  <=  ~Adata_reg[0:7]; 
data_parity_calc[l]  <=  ~Adata_reg[8:15]; 
data_parity_calc[2]  <=  ~Adata_reg[  16:23] 
data_parity_calc[3]  <=  ~Adata_reg[24:31] 
data_parity_calc[4]  <=  ~Adata_reg[32:39] 
data_parity_calc[5]  <=  -Mata^eg [40:47] 
data_parity_calc[6]  <=  ~Adata_reg[48:55] 
data_parity_calc[7]  =  ~Adata_reg[56:61]; 
end 


always 
begin 

//wait  for  qualified  data  bus  grant  and  transfer  start. 
wait(qual_DBG_=low  &  Transfer_start); 
@(posedge  elk)  //assume  data  bus  mastership 
dbb_reg_  <=  #7  low; 
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1  =  0; 

while  (i<4) 
begin 

@(transfer_acknowledged)  //latch  beat 
data_reg  <=  D; 
data_parity_in  =  DP; 

#2  if  (trace)  $display("     BIU:  %h  at  %d",data_reg,$time); 
#2  if  (data_parity_in  !=  data_parity_calc) 
begin 
$display("BIU:  data  parity  error."); 
$display("   Calculated  parity:  %b", 

data_parity_calc); 
$display("    Recevied  parity:    %b", 

data_parity_in); 
Data_Panty_Error  =  TRUE; 
i  =  4; 
end 
else 
begin 
if  (i=0)  data_line[  0:  63]  =  data_reg; 
if  (i=l)  data_line[  64:127]  =  data_reg; 
if  (i=2)  data_line[128:191]  =  data_reg; 
if  (i=3)  data_line[192:255]  =  data_reg; 
i  =  i+l; 
end 
end 

Transfer_in_progress  <=  FALSE; 
Transfer_start  <=  FALSE; 
dbb_reg_  =  #4  hi; 
dbb_reg_  =  #8  'bz; 
end 


//DATA  BUS  PROTOCOL  FOR  SENDS  (PRC  acting  as  memory  module) 
always  @(start_send) 
begin 
i  =  0; 
while  (i<4)  begin 

@(posedge  elk); 

#1  ta_reg_=  'bz; 

j  =  burst_start+i;  II]  is  mod  4 

if  (j=0)  data_reg  =  data_line[  0:  63]; 

if  (j=l)  data_reg  =  data_line[  64: 127]; 

if  0=2)  data_reg  =  data_line[128:191]; 

if  (j=3)  data_reg  =  data_line[192:255]; 

d_reg  =  data_reg; 

#4  dp_reg  <=  data_parity_calc; 

ta_reg_  <=  low; 

i=i+l; 
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end 

send_done  <=  hi: 
@(posedge  elk) 
ta_reg_  <=  #7  'bz; 
d_reg  <=  #7  64'bz; 
dp_reg  <=  #7  8'bz; 
end 


//FINITE  STATE  MACHINE 
always  @(negedge  HRESET_) 
begin 
if  (HRESET_  =  low) 
begin 
state  <=  idle; 
next_state  <=  idle; 
wait(HRESET_  =  hi); 
end 
end 

always 
begin 
#2  state  =  next_state; 

#1  case  (state) 
idle:  //O 
begin 

upload  <=  low; 
fetch_done  <=  low; 
send_done  <=  low; 
CANX  <=  low; 
data_line_reg  <=  256'bz; 
@(posedge  elk)  inputs2  =  {send,fetch}; 
case  (inputs2) 
2'b00:  next_state  =  idle; 
2'b01:  next_state  =  fetch  1; 
2'blO:  next_state  =  sendl; 
2'bll:  next_state  =  idle;  //This  should  not  happen, 
endcase 
end 

fetchl:  //l 
begin 

//I .  Latch  next  address. 
NAR[0:26]  <=  NAR_IN; 
NAR[27:31]<=5'bO; 
112.  Request  Bus 
BR_  <=  low; 

Transfer_in_progress  <=  TRUE; 
@(posedge  elk) 
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next_state  =  fetch2; 
end 

fetch2:  111 
begin 

III.  Wait  for  all  data  to  be  received. 
@(posedge  elk)  input  1  =  Transfer_in_progress; 
case  (input  1) 
1'bO:  next_state  =  fetch3; 
l'bl:  next_state  =  fetch2; 
endcase 
end 

fetch3:  //3 
begin 

111.  Upload  the  data  line. 
data_line_reg  <=  data_line; 
upload  <=  hi; 

112.  Assert  fetch_done. 
fetch_done  <=  hi; 
@(posedge  elk) 

next_state  =  idle; 
end 

sendl://5 
begin 

III.  Cancel  the  memory  access. 
CANX  <=  hi; 
HI.  Latch  burst_start. 
burst_start  <=  BURSTSTAJRT; 

113.  Download  data  from  the  data  list, 
download  <=  hi; 

#5  datajine  <=  DATALINE; 
@(posedgeclk) 
next_state  =  send2; 
end 

send2:  //6 
begin 

III.  Send  data  to  CPU 
->  start_send; 
CANX  <=  low; 
download  <=  low; 

@(posedge  elk)  inputl  =  {send_done}; 
case  (inputl) 
1'bO:  next_state  =  send2; 
l'bl:  next_state  =  idle; 
endcase 
end 
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default: 
begin 

$display("state  error  in  module  bus_interface."); 

$display("    state  =  %b.",state); 
end 

endcase 
end 


endmodule 


H.    PREDICTION  TEST 


*  Transaction  Sequencer  -  Prediction  Test 

*  Filename:  sequencer4.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     21DEC95 

*  Revised:  05JAN96 

* 

*  Purpose:  This  is  one  in  a  set  of  modules  which  perform  a  sequence  of  CPU 

*  transactions.  This  sequencer  causes  a  series  of  CPU  operations  that  provide 

*  a  comprehensive  test  of  the  PRC.  It  demonstrates  a  majority  of  the  PRC's 

*  capabilities,  showing  when  the  Line  Manager  selects  new  lines,  when  and  how 

*  the  Predictor  functions,  when  the  CPU  starts  a  read  or  write  and  the  data 

*  involved.  It  shows  when  the  Bus  Interface  Unit  fetches  data  from  memory. 

*  The  DataList  reports  the  flow  of  data  in  and  out  of  it.  The  only  significant 

*  behavior  not  exercised  by  this  test  is  the  function  of  the  Line  Replacement 

*  Unit  when  the  PRC  is  full.  That  is  handled  with  Sequencer  #5. 

*  Sequence  #4: 

*  burst_read  OOh 

*  burst_read  20h  -  PRC  should  predict  40h  and  fetch  data. 

*  burst_read  180h  -  PRC  should  start  a  new  line. 

*  burst_read  1  AOh  -  PRC  should  predict  ICOh. 

*  burst_read  40h  -  already  in  PRC,  should  predict  60h. 

*  burst_write  ICOh  -  should  flush  line. 

*  burst_read  60h  -  already  in  PRC,  predicts  80. 

*  burst_read  lOOh  -  PRC  should  start  a  new  line. 

*  When  using  this  sequencer,  set  all  trace  flags  to  TRUE  (except  the 

*  Controller),  and  run  the  simulation  for  6000  steps. 

* 

*  General  Timing  instructions  for  all  Sequencers: 

*  Use  an  initial  block  for  each  transaction.  You  must  ensure  that  the 

*  following  rules  are  adhered  to: 
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*  1.  Before  the  first  transaction,  use 

*  repeat(2)@(posedge  elk) 

*  2.  Before  the  first  line  of  the  second  transaction,  use 

*  wait(ABB_=low); 

*  wait(ABB_=hi); 

*  3.  There  can  be  only  two  transactions  pipelined  at  a  time.  You  must  ensure 

*  manually  that  the  first  operation  is  complete  before  the  third  begins. 

*  When  scheduling  the  current  transaction,  look  at  the  transaction  before 

*  last.  Wait  for  that  TA_  to  finish.  Also,  wait  for  the  ABB_  from  the 

*  previous  transaction  to  go  high. 

*  4.  A  burst  read  takes  330  simulation  time  units  =  22  clock  cycles. 

* 

module  sequencer(Transfer_size,clk,pp,address,data,line,Transfer_type, 
Transfer_code,need_bus_trigger_,ABB_); 

input  clk,ABB_; 

output  pp,need_bus_trigger_; 

output  [0:31]  address; 

output  [0:63]  data; 

output  [0:255]  line; 

output  [0:4]  Transfer_type; 

output  [0:2]  Transfer_size; 

output  [0:1]  Transfer_code; 

reg  pp,need_bus_trigger_; 

reg  [0:31]  address; 

reg  [0:63]  data; 

reg  [0:255]  line; 

reg  [0:4]  Transfer_type; 

reg  [0:2]  Transfer_size; 

reg  [0:1]  Transfer_code; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  I'M, 
FALSE  =l'b0, 
hi     =  11)1, 
low   -  1'bO; 

parameter  //for  Transfer_type 

none         =  5T)z, 

write        =  5'bOOOlO,         //02 

write_atomic  =  5'bl0010,   //12 

read         =  5'b01010,         //0A 

read_atomic  =5'bll010,    //1A 

burst_write  =51)00110,     //06 

burst_read    =  5'bOlllO,     //0E 

burst_read_atomic  =  5'bl  1 1 10;    //IE 
parameter  //for  Transfer_code 
data_transfer     =  2'bOO, 
touch_load        =2'b01. 
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instrucuon_fetch  =  2'blO, 
reserved  =  2'b  1 1 ; 

//initialize  signals 
initial 
begin 
pp  <=  0; 

address  <=  32'bz; 
line<=  256'bz; 
end 

//Perform  sequence  of  transactions 
initial 
begin 

repeat(2)@(posedge  elk); 

//BURST  READ 

pp  <=  ~pp; 

address  <=  32'hOOOOOOOO; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
begin 
wait(ABB_=low); 
wait(ABB_=hi); 
//BURST  READ 

PP  <=  ~PP: 

address  <=  32'h00000020; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
begin 
repeat(75)@(posedge  elk); 
//BURST  READ 

PP  <=  ~pp; 

address  <=  32'h00000180; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
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begin 
repeat(100)@(posedge  elk); 
//BURST  READ 

PP  <=  ~PP: 

address  <=  32'hOOOOOlAO; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
begin 
repeat(150)@(posedge  elk); 
//BURST  READ 

pp  <=  ~pp; 

address  <=  32'h00000040; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
begin 
repeat(200)@(posedge  elk); 
//BURST  WRITE 

pp  <=  ~pp; 

address  <=  32'hOOOOOlCO; 
Transfer_type  <=  burst_write; 
Transfer_code  <=  data_transfer; 
line  <=  {64'h7777777777777777,  641 

64'hl  111111111111111,  64'h3333333333333333  j 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
begin 
repeat(225)@(posedge  elk); 
//BURST  READ 
pp  <=  ~pp; 

address  <=  32,h00000060; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

initial 
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begin 
repeat(250)@(posedge  elk); 
//BURST  READ 

PP  <=  ~PP' 

address  <=  32'hOOOOOlOO; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

endmodule 


I.    PREDICTION  TEST  RESULTS 


Host  command:  verilog 
Command  arguments: 
-f  verilog_arguments 

bus_interface.v 

prc.v 

snooper.v 

controller.v 

datalist.v 

line_mgr.v 

predictor.v 

testbench.v 

arbiter.v 

cpu.v 

memory.v 

sequencer5.v 

VERILOG-XL  2.1.2  log  file  created  Feb  2.  1996  13:14:29 
VERILOG-XL  2.1.2    Feb  2,1996  13:14:29 

Copyright  (c)  1994  Cadence  Design  Systems,  Inc.  All  Rights  Reserved. 
Unpublished  —  rights  reserved  under  the  copyright  laws  of  the  United  States. 

Copyright  (c)  1994  UNIX  Systems  Laboratories,  Inc.  Reproduced  with  Permission. 

THIS  SOFTWARE  AND  ON-LINE  DOCUMENTATION  CONTAIN  CONFIDENTIAL  INFORMATION 
AND  TRADE  SECRETS  OF  CADENCE  DESIGN  SYSTEMS,  INC.  USE,  DISCLOSURE,  OR 
REPRODUCTION  IS  PROHIBITED  WITHOUT  THE  PRIOR  EXPRESS  WRITTEN  PERMISSION  OF 
CADENCE  DESIGN  SYSTEMS,  INC. 
RESTRICTED  RIGHTS  LEGEND 

Use,  duplication,  or  disclosure  by  the  Government  is  subject  to 
restrictions  as  set  forth  in  subparagraph  (c)(1)(h)  of  the  Rights  in 
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Technical  Data  and  Computer  Software  clause  at  DFARS  252.227-7013  or 
subparagraphs  (c)(1)  and  (2)  of  Commercial  Computer  Software  —  Restricted 
Rights  at  48  CFR  52.227-19,  as  applicable. 

Cadence  Design  Systems,  Inc. 
555  River  Oaks  Parkway 
San  Jose,  California  95134 

For  technical  assistance  please  contact  the  Cadence  Response  Center  at 
1 -800-CADENC2  or  send  email  to  crc_customers@cadence.com 

For  more  information  on  Cadence's  Verilog-XL  product  line  send  email  to 
talkverilog@cadence.com 
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AcdveLine  =  : 

LlOatSd 

123787 

AcdveLine  =  : 

[11  at$d 

124912 

AcdveLine  = 

[  12  at  Sd 

126037 

AcdveLine  = 

L13  at  Sd 

127162 

ActiveLine  =  1 

14  at  Sd 

128287 

AcdveLine  =  ] 

L15atSd 

129412 

AcdveLine  =  : 

L 16  at  $d 

130537 

AcdveLine  =  '. 

L 17  at  $d 

131662 

AcdveLine  =  1 

118at$d 

132787 

AcdveLine  = 

19  at  Sd 

133912 

AcdveLine  = 

20  at  Sd 

135037 

.29 


Line_mgr  selected  new  ActiveLine  =  121  at  $d  136162 

Line_mgr  selected  new  ActiveLine  =  122  at  $d  137287 

Line_mgr  selected  new  ActiveLine  =  123  at  $d  138412 

Line_mgr  selected  new  ActiveLine  =  124  at  $d  139537 

Line_mgr  selected  new  ActiveLine  =  125  at  Sd  140662 

Line_mgr  selected  new  ActiveLine  =  126  at  $d  141787 

Line_mgr  selected  new  ActiveLine  =  127  at  $d  142912 

Line_mgr  selected  new  ActiveLine  =   0  at  $d  145 1 62 

Line_mgr  selected  new  ActiveLine  =    1  at  $d  146287 

Line_mgr  selected  new  ActiveLine  =   2  at  $d  147412 

Line_mgr  selected  new  ActiveLine  =    3  at  $d  148537 

L122  "testbench.v":  Sfinish  at  simulation  time  152010 
31769681  simulation  events  +  8392  accelerated  events 
CPU  time:  1.0  sees  to  compile  +  0.9  sees  to  link  +  1 16.2  sees  in  simuladon 
End  of  VERILOG-XL  2.1.2    Feb  2,  1996  13:16:34 


J.    LINE  REPLACEMENT  TEST 


*  Transacdon  Sequencer  -  Line  Replacement  Test 

*  Filename:  sequencer5.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     05JAN96 

*  Revised:  05JAN96 

* 

*  Purpose:  This  is  one  in  a  set  of  modules  which  perform  a  sequence  of  CPU 

*  transactions.  This  Sequencer  causes  a  series  of  CPU  operations  which  will 

*  quickly  fill  the  PRC.  This  will  test  the  Line  Replacement  Unit's  behavior 

*  when  it  needs  to  start  replacing  previously  used  lines. 

* 

*  Sequence  #5: 

*fori  =  0tol32, 

*  burst_read  iOOh  -  PRC  should  switch  to  new  line  i. 

*  burst_read  i20h  -  PRC  should  predict  i40h,  and  store  data  in  line  i. 

*  next  i 

* 

*  When  using  this  sequencer,  set  all  trace  flags  to  FALSE,  except  for  the  Line 

*  Manager,  and  run  the  simuladon  for  152000  steps. 

* 

*  General  Timing  instrucdons  for  all  Sequencers: 

*  Use  an  initial  block  for  each  transaction.  You  must  ensure  that  the 

*  following  rules  are  adhered  to: 

*  1.  Before  the  first  transacdon,  use 

*  repeat(2)@(posedge  elk) 

*  2.  Before  the  first  line  of  the  second  transaction,  use 

*  wait(ABB_=low); 
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*  wait(ABB_==hi); 

*  3.  There  can  be  only  two  transactions  pipelined  at  a  time.  You  must  ensure 

*  manually  that  the  first  operation  is  complete  before  the  third  begins. 

*  When  scheduling  the  current  transaction,  look  at  the  transaction  before 

*  last.  Wait  for  that  TA_  to  finish.  Also,  wait  for  the  ABB_  from  the 

*  previous  transaction  to  go  high. 

*  4.  A  burst  read  takes  330  simulation  time  units  =  22  clock  cycles. 

* 

module  sequencer(Transfer_size,clk,pp,address,data,line.Transfer_type, 
Transfer_code,need_bus_trigger_,ABB_ ); 

input  clk,ABB_; 
output  pp,need_bus_trigger_; 
output  [0:31]  address; 
output  [0:63]  data; 
output  [0:255]  line; 
output  [0:4]  Transfer_type; 
output  [0:2]  Transfer_size; 
output  [0:1]  Transfer_code; 
reg  pp,need_biis_trigger_; 

reg  [0:31]  address; 

reg  [0:63]  data; 

reg  [0:255]  line; 

reg  [0:4]  Transfer_type; 

reg  [0:2]  Transfer_size; 

reg  [0:1]  Transfer_code; 

//declare  variables,  constants,  parameters 
parameter  TRUE  =  l'bl, 
FALSE  =l'b0, 
hi     =  l'bl, 
low    =  1'bO; 

parameter  //for  Transfer_type 

none         =  5'bz, 

write        =  5'bOOOlO,         //02 

write_atomic  =  5'bl0010,   //12 

read         =  5'b01010,         //0A 

read_atomic  =  5'bll010,    //1A 

burst.write  =5'b00110,     //06 

burst_read   =  5'b01110,     //0E 

burst_read_atomic  =  5'bl  1 1 10;   //IE 
parameter  //for  Transfer_code 
data_transfer     =  2*b00, 
touchjoad        =2'b01, 
instruction_fetch  =  2'blO, 
reserved  =  2'bll; 

//Other  internal  control  signals 
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reg  [0:7]  i;// counter 

//initialize  signals 
initial 
begin 
pp  <=  0; 

address  <=  32'bz; 
line  <=  256'bz; 
end 

//Perform  sequence  of  transactions 
initial 
begin 

repeat(2)@(posedge  elk); 

//BURST  READ 

PP  <=  ~PP: 

address  <=  32'hOOOOOOOO; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

iniual 
begin 
wait(ABB_=low); 
wait(ABB_=hi); 
//BURST  READ 

pp  <=  ~pp; 

address  <=  32'h00000020; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
end 

iniual 
begin 
repeat(25)@(posedge  elk); 
for  (i=l;  i<=132;  i=i+l) 

begin 

repeat(50)@(posedge  elk); 
//BURST  READ 
pp  <=  ~pp; 

address  <=  {12'bO,  i,  12'bO}; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 
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repeat(25)@(posedge  elk); 
//BURST  READ 

pp  <=  ~pp; 

address  <=  {12'bO,  i,  12'h020}; 
Transfer_type  <=  burst_read; 
Transfer_code  <=  data_transfer; 
need_bus_trigger_  <=  #4  low; 
need_bus_trigger_  <=  #6  hi; 


end 
end 
endmodule 

K.    LINE  REPLACEMENT  TEST  RESULTS 


Host  command:  verilog 
Command  arguments: 
-f  verilog_arguments 

bus_interface.v 

prc.v 

snooper.v 

controller.v 

datalist.v 

line_mgr.v 

predictor.v 

testbench.v 

arbiter.v 

cpu.v 

memory.v 

sequencer4.v 

VERILOG-XL  2.1.2  log  file  created  Feb  2,  1996  13:22:22 
VERILOG-XL  2.1.2   Feb  2,1996  13:22:22 

Copyright  (c)  1994  Cadence  Design  Systems,  Inc.  All  Rights  Reserved. 
Unpublished  —  rights  reserved  under  the  copyright  laws  of  the  United  States. 

Copyright  (c)  1994  UNIX  Systems  Laboratories,  Inc.  Reproduced  with  Permission. 

THIS  SOFTWARE  AND  ON-LINE  DOCUMENTATION  CONTAIN  CONFIDENTIAL  INFORMATION 
AND  TRADE  SECRETS  OF  CADENCE  DESIGN  SYSTEMS,  INC.  USE,  DISCLOSURE,  OR 
REPRODUCTION  IS  PROHIBITED  WITHOUT  THE  PRIOR  EXPRESS  WRITTEN  PERMISSION  OF 
CADENCE  DESIGN  SYSTEMS,  INC. 
RESTRICTED  RIGHTS  LEGEND 
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Use,  duplication,  or  disclosure  by  the  Government  is  subject  to 
restrictions  as  set  forth  in  subparagraph  (c)(1)(h)  of  the  Rights  in 
Technical  Data  and  Computer  Software  clause  at  DFARS  252.227-7013  or 
subparagraphs  (c)(1)  and  (2)  of  Commercial  Computer  Software  —  Restricted 
Rights  at  48  CFR  52.227-19,  as  applicable. 

Cadence  Design  Systems,  Inc. 
555  River  Oaks  Parkway 
San  Jose,  California  95134 

For  technical  assistance  please  contact  the  Cadence  Response  Center  at 
1-800-CADENC2  or  send  email  to  crc_customers@cadence.com 

For  more  information  on  Cadence's  Verilog-XL  product  line  send  email  to 
talkverilog(5)cadence.com 

Compiling  source  file  "bus_interface.v" 
Compiling  source  file  "prc.v" 
Compiling  source  file  "snooper.v" 
Compiling  source  file  "controller.v" 
Compiling  source  file  "datalist.v" 
Compiling  source  file  "line_mgr.v" 
Compiling  source  file  "predictor.v" 
Compiling  source  file  "testbench.v" 
Compiling  source  file  "arbiter.v" 
Compiling  source  file  "cpu.v" 
Compiling  source  file  "memory. v" 
Compiling  source  file  "sequencer4.v" 
Highest  level  modules: 
testbench 

Line_mgr  selected  new  ActiveLine  =    0  at  Sd  5 

CPU  started  read  from  address  00000000  at  tune  45. 

CPU  read:  0001020304050607  at  181 

CPU  read:  08090a0b0c0d0e0f  at  24 1 

CPU  read:  1011121314151617  at  301 

CPUread:18191alblcldlelfat  361 

CPU  started  read  from  address  00000020  at  tune  390. 

BIU  started  read  from  address  00000040  at  time  412. 

CPU  read:  2021222324252627  at  496 

CPU  read:  28292a2b2c2d2e2f  at  556 

CPU  read:  3031323334353637  at  616 

CPU  read:  38393a3b3c3d3e3f  at  676 

BIU: 4041424344454647  at  812 

BIU:  48494a4b4c4d4e4f  at  872 

BIU:  5051525354555657  at  932 

BIU:  58595a5b5c5d5e5f  at  992 

DATALIST  uploaded  this  data  into  line  00  at  time  1008. 

4()4142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f 
CPU  started  read  from  address  00000 1 80  at  time  1 1 40. 
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Line_mgr  selected  new  ActiveLine  =    1  at  $d  1 162 

CPU  read:  0001020304050607  at  1276 

CPU  read:  08090a0b0c0d0e0f  at  1336 

CPU  read:  1011121314151617  at  1396 

CPU  read:  18191alblcldlelf  at  1456 

CPU  started  read  from  address  00000 1  aO  at  time  1515. 

CPU  read:  2021222324252627  at  1651 

BIU  started  read  from  address  OOOOOlcO  at  time  1657. 

CPU  read:  28292a2b2c2d2e2f  at  1711 

CPU  read:  3031323334353637  at  1771 

CPU  read:  38393a3b3c3d3e3f  at  1831 

BIU:  4041424344454647  at  1967 

BIU:  48494a4b4c4d4e4f  at  2027 

BIU: 5051525354555657  at  2087 

BIU:  58595a5b5c5d5e5f  at  2147 

D ATALIST  uploaded  this  data  into  line  0 1  at  time  2 1 63 . 

404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f 
CPU  started  read  from  address  00000040  at  time  2265. 

Line_mgr  selected  new  ActiveLine  =    0  at  $d  2287 

DATALIST  downloaded  this  data  from  line  00  at  time  2313. 

404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f 

CPU  read:  404 1424344454647  at  2356 

CPU  read:  48494a4b4c4d4e4f  at  237 1 

CPU  read:  5051525354555657  at  2386 

CPU  read:  58595a5b5c5d5e5f  at  2401 

BIU  started  read  from  address  00000060  at  time  2482. 

BIU:  6061626364656667  at  2627 

BIU:  68696a6b6c6d6e6f  at  2687 

BIU:  7071727374757677  at  2747 

BIU:  78797a7b7c7d7e7f  at  2807 

DATALIST  uploaded  this  data  into  line  00  at  time  2823. 

606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f 
CPU  started  write  to  address  OOOOOlcO  at  time  3007. 

CPU  write  beat  1:  7777777777777777  at  3022 

Line_mgr  selected  new  ActiveLine  =    1  at  $d  3037 

Line  manager  flushed  line    1  at  time  3048. 

CPU  write  beat  2:  8888888888888888  at  3 1 58 

CPU  write  beat  3:  1111111111111111  at  3218 

CPU  write  beat  4:  3333333333333333  at  3278 

CPU  started  read  from  address  00000060  at  time  3390. 

Line_mgr  selected  new  AcdveLine  =    Oat$d  3412 

DATALIST  downloaded  this  data  from  line  00  at  time  3438. 

6061 626364656667  68696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f 

CPU  read:  606 1 626364656667  at  348 1 

CPU  read:  68696a6b6c6d6e6f  at  3496 

CPU  read:  707 1727374757677  at  3511 

CPU  read:  78797a7b7c7d7e7f  at  3526 

BIU  started  read  from  address  00000080  at  time  3607. 

BIU:  0001020304050607  at  3752 

BIU:  08090a0b0c0d0e0f  at  38 12 
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BIU:  1011121314151617  at  3872 

BIU:  18191alblcldlelfat  3932 

CPU  started  read  from  address  00000100  at  time  3945. 

DATALIST  uploaded  this  data  into  line  00  at  time  3948. 

000102030405060708090a0b0c0d0e0fl01112131415161718191alblcldlelf 
Line_mgr  selected  new  ActiveLine  =    2  at  $d  3982 

CPU  read:  0001020304050607  at  4066 

CPU  read:  08090a0b0c0d0e0f  at  4126 

CPU  read:  1011121314151617  at  4186 

CPU  read:  18191alblcldlelf  at  4246 

L123  "testbench.v":  Sfinish  at  simulation  time  6010 
1661039  simulation  events  +  265  accelerated  events 
CPU  time:  0.8  sees  to  compile  +  0.8  sees  to  link  +  5.0  sees  in  simulation 
End  of  VERILOG-XL  2.1.2    Feb  2,  1996  13:22:29 
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APPENDIX  D.  PRC  STRUCTURE  FILES 


This  appendix  contains  the  Verilog  files  for  the  final 
hardware  design.  They  include  the  Verilog  structural  models 
of  the  PRC  and  the  testing  results.  The  files  are  located  on 
the  ECE  system  at  home5/robert/thesis/epoch/verilog. 


PRC 


*  Predictive  Read  Cache 

*  Filename:  prc.v 

*  Author:  Joseph  R.  Robert.  Jr. 

*  Date:     02OCT95 

*  Revised:  14MAR96 

Purpose:  This  module  emulates  the  predictive  read  cache,  connecting  all  the  parts. 

module  prc(HRESET_,clk,BG_,DBG_,BR_,CANX,DA,DP,TT,AP,TSIZ,TC,ABB_,AACK_,TS_, 
TBST_,DBB_,TA_,DPE_); 

//  epoch  set_attribute  FDCEDBLOCK  =  1 

input  HRESET_,clk,BG_,DBG_; 

output  BR_,CANX; 

inout  [63:0]  D; 

inout  [31:0]  A; 

inout  [7:0]  DP; 

inout  [4:0]  TT; 

inout  [3:0]  AP; 

inout  [2:0]  TSIZ; 

inout  [1:0]  TC; 

inout  ABB_,AACK_,TS_,TBST_,DBB_,TA_,DPE_; 

wire  [255:0]  DATALINE; 
wire  [26:0]  CAR,NAR,MRMA; 
wire  [6:0]  ActiveLine; 
wire  [1:0]  BURSTSTART; 

wire  fetch_done,fetch_abort.send_done.read.write.hit,line_empty, 
snoop_ignore,upload,download,BR_,CANX; 
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[63:0]  D; 

[31:0]  A; 

[7:0]  DP; 

[4:0]  TT; 

[3:0]  AP; 

[2:0]  TSIZ; 

[1:0]  TC; 

ABB_.AACK_,TS_,TBST_,DBB_,TA_,DPE_; 


//Connect  parts  which  have  been  converted  to  hardware. 

//  epoch  precompiled  predictor 

predictor  PRE l(MRMA,CAR[25:0],predict,NAR.HRESETJ; 

//  epoch  precompiled  line_mgr 

line_mgr  LM1  (CAR,NAR,HRESET_,a_select,test,fetch_  done,flush,store, 
newceplace,MRMA,AcUveLine,line_empty,hit,clk); 

//  epoch  precompiled  datalist 

datalist  DL 1  (DATALINE,ActiveLine,upload,download); 

//  epoch  precompiled  snooper 

snooper  SN 1  (A, AP,TT,TC,TS_,snoop_ignore,hold,clk,CAR,BURSTSTART, 
read,write,HRESET_); 

//  epoch  precompiled  bus_interf ace 

bus_interf  ace  BIU 1  (NAR,BURSTSTART,BG_,  AACK_,DBG_,send,fetch, 
clk,BR_,upload,download,fetch_done,fetch_abort, 
send_done,CANX,snoop_ignore,DATALINE,D,A,AP,DP,DPE_, 
TT,TSIZ,TCABB_,TS_,TBST_,DBB_,TA_,HRESETJ; 

//  epoch  precompiled  controller 

controller  CON l(HRESET_,read,write,hit,send_done,fetch_done,fetch_abort, 

Iine_empty.a_select,test,predict,store, 

flush,send,hold,new_replace,fetch,clk); 


endmodule 


B .    CONTROLLER 


*  CONTROLLER 

*  Filename:  controllers 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     21DEC95 
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*  Revised:  20MAR96 

Purpose:  This  module  is  a  Finite  State  Machine  which  coordinates  the  actions  of  all  the  other  functional 
blocks  of  the  PRC.  All  control  signals  are  synchronous  with  the  system  clock.  HRESET_  causes  the  Controller 
to  go  to  the  IDLE  state.  The  state  diagram  and  state  output  tables  give  more  details. 

Of  significance  are  the  wait  states  added  to  the  state  diagram  of  the  behavioral  model.  These  changes  are 
highlighted  in  the  State  Output  Table.  The  changes  were  required  by  the  Line  Manager,  in  which  there  is  a 
significant  propagation  delay  for  the  addresses.  This  is  described  in  more  detail  in  the  Line  Manager  section  of  this 
chapter.  This  is  a  prime  candidate  for  future  work  to  improve  the  PRC's  design. 
* 

module  controller  (HRESET_,read,write,hit,send_done,fetch_done,fetch_abort, 
line_empty,a_select,test,predict,store, 
flush,send,hold.new_replace,fetch,clk); 

//  epoch  set_attribute  FIXEDBLOCK  =  1 

input  HRESET_,read,write,hit,send_done,fetch_done,fetch_abort,line_empty,clk: 
output  a_select,test,predict,store,flush,send,hold,new_replace,fetch; 

reg  a_select,test,predict,store,flush,send,hold,new_replace,fetch; 
//Finite  State  Machine 


parameter  //  epoch  enum  stat 

idle 

:  5'dO, 

test_car_r 

=  561, 

send_data 

=  5'd2, 

test_nar 

=  5'd3, 

fetch_data 

=  5'd4, 

is_line_empty  =  5'd5, 

predict_na 

=  5'd6, 

store_car 

=  5'd7, 

test_car_w 

=  5'd8, 

flushjine 

=  5'd9, 

wait_a 

=  5'dl0, 

wait_b 

=  5'dll. 

wait_c 

=  5'dl2. 

wait_d 

=  5'dl3, 

wait_e 

=  5'dl4, 

wait_f 

=  5'dl5, 

wait_g 

=  5*dl6, 

wait_h 

=  5'dl7. 

wait_i 

=  5'dl8, 

dc_state 

=  5'bx; 

reg  [4:0]  I*  epoch  enum  stat  */  state,  next_state; 

reg  a_select,fetch,flush,hold,new_replace,predict,send,store,test; 
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always  @(posedge  elk  or  negedge  HRESETJ 
begin 
if(!HRESETJ 
state  =  idle; 
else 
state  =  next_state; 
end 

always  @(state  or  read  or  write  or  hit  or  send_done  or  line_empty  or 
fetch_done  or  fetch_abort) 
begin 

//default  values 

a_select    =  1'bO;  //CAR 
fetch       =  1'bO; 
flush        =  1'bO; 
hold        =  1'bO; 
new_replace  =  1'bO; 
predict     =  1'bO; 
send         =  1'bO; 
store        =  1'bO; 
test        =  1'bO; 

case  (state) 

idle:  //0 
begin 

if      (read  =  1'bO  &  write  =  1'bO)  next_state  =  idle; 

else  if  (read  =  1'bO  &  write  =  l'bl)  next_state  =  wait_d; 

else  if  (read  =  l'bl)  next_state  =  wait_a; 

else  next_state  =  dc_state; 
end 

wait_a:  //10 
begin 

hold        =  l'bl; 

next_state  =  wait_b; 
end 

wait_b:  //l  1 
begin 

hold        =  l'bl; 

next_state  =  wait_c; 
end 

wait_c://12 
begin 

hold        =  l'bl; 

next_state  =  test_car_r; 
end 
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wait_d:  //13 
begin 
hold        =  l'bl; 
next_state  =  wait_e; 
end 

wait_e:  //14 
begin 

hold        =  l'bl; 

next_state  =  wait_f; 
end 

wait_f://15 
begin 

hold        =  l'bl; 

next_state  =  test_car_w; 
end 

test_car_r:  //l 
begin 

test        =  l'bl; 
hold        =  l'bl; 
if  (hit) 

next_state  =  send_data; 
else  next_state  =  is_line_empty; 
end 

send_data:  //2 
begin 

a_select     =  l'bl;  //NAR 

predict      =  l'bl; 

send        =  l'bl; 

hold        =  l'bl; 

if  (send_done) 
next_state  =  test_nar; 

else  next_state  =  send_data; 
end 

test_nar:  //3 
begin 
a_select    =  l'bl;  //NAR 
test        =  l'bl; 
hold        =  l'bl; 

if  ({hit,read,write}  =  3'b000)  next_state  =  fetch_data; 
else  next_state  =  idle; 
end 

fetch_data:  //4 
begin 
a_select     =  l'bl;  //NAR 
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hold        =  I'M; 
fetch       =  11)1; 

if  ({fetch_done,fetch_abort}  =  2'b00)  next_state  =  fetch_data; 
else  next_state  =  idle; 
end 

is_line_empty:  //5 
begin 
hold        =  11)1; 
if  (line_empty) 
next_state  =  store_car; 
else  next_state  =  predict_na; 
end 

predict_na:  //6 
begin 
a_select    =  l'bl ; 
predict      =  l'bl; 
hold        =  l'bl; 
new_replace  =  l'bl; 
next_state  =  wait_g; 
end 

wait_g:  //16 
begin 
a_select    =  l'bl;  //NAR 
hold        =  l'bl; 
next_state  =  wait_h; 
end 

wait_h:  I  111 
begin 
a_select     =  l'bl;  //NAR 
hold        =  l'bl; 
next_state  =  wait_i; 
end 

wait_i:  //18 
begin 

a_select     =  l'bl;  //NAR 

hold        =  l'bl; 

next_state  =  test_nar; 
end 

store_car:  //7 
begin 

store       =  l'bl; 

hold        =  l'bl; 

next_state  =  idle; 
end 
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test_car_w:  //8 
begin 
test        =  I'M; 
hold        =  11)1; 
if  (hit) 

next_state  =  flushjine; 
else  next_state  =  idle; 
end 

flushjine:  //9 
begin 

flush       =11)1; 

hold        =  l'bl; 

next_state  =  idle; 
end 

default: 
begin 
next_state  =  dc_state; 
end 

endcase 
end 

endmodule 


SNOOPER 


*  SNOOPER 

*  Filename:  snooper .v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     21DEC95 

*  Revised:  06MAR96 

Purpose:  This  module  watches  the  system  bus  activity,  and  makes  appropriate  reports  to  the  PRC 
Controller. 

If  the  transaction  is  a  data  burst  read  or  any  kind  of  write,  and  if  the  address  parity  is  correct,  then  the  read 
or  write  signal  is  asserted  as  appropriate,  and  the  address  is  placed  in  the  CAPv.  The  snoop_ignore  signal  tells  this 
unit  to  ignore  the  current  transaction,  because  it  was  initiated  by  the  Bus  Interface  Unit.  The  snoop_ignore  signal 
must  be  asserted  concurrendy  with  the  transfer  attributes.  Reads  that  are  not  burst  or  data  related  are  ignored  by 
the  PRC.  The  CAR  is  updated  only  on  transactions  relevant  to  the  PRC. 

Due  to  the  two-stage  pipelining  capability  of  the  PowerPC,  with  respect  to  memory  accesses,  a  second 
address  tenure  can  occur  shortly  after  the  first  well  before  the  first  data  tenure  is  complete.  To  compensate  for  this, 
the  read  and  write  outputs  of  the  Snooper  will  remain  exerted  until  acknowledged  by  the  Controller  with  hold.  The 
rising  edge  of  hold  indicates  that  the  read  or  write  signal  was  received  by  the  Controller.  The  Snooper  can  then 
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negate  these  signals,  but  must  leave  CAR  alone  until  hold  is  negated.  After  hold  is  negated,  CAR  can  be  updated 
to  the  new  address. 

In  Stage  0,  the  transfer  attributes  are  latched  in  registers.  Combinational  logic  determines  if  these  tranfer 
attributes  represent  a  valid  read  or  a  valid  write,  and  if  the  parity  address  parity  is  correct.  If  the  transaction  is  valid, 
and  one  that  the  PRC  is  interested  in,  then  Stage  0  raises  a  transaction_waiting  signal. 

A  Finite  State  Machine  in  Stage  One  sits  in  the  IDLE  state  until  it  receives  that  signal.  Then  it  latches 
the  signals  needed  from  Stage  0,  resets  the  transact! on_wai ting  signal,  and  then  waits  for  the  hold  signal  to  go  low. 
A  high  hold  signal  indicates  that  the  PRC  is  not  done  with  the  previous  transaction.  Once  hold  goes  low,  the  read 
and  write  flags  are  set  according  to  the  type  of  the  current  transaction.  Also,  the  input  address  is  stored  in  the 
Current  Address  Register.  The  FSM  then  waits  for  the  rising  edge  of  hold  before  returning  to  the  IDLE  state  where 
it  can  check  if  there  is  another  transaction  waiting. 

module  snooper  (A,AP,TT,TC,TS_,snoop_ignore,hold,clk,CAR,BURSTSTART, 
read_flag,write_flag,HRESET_); 

//  epoch  set_attribute  FIXEDBLOCK  =  1 

input  [31:0]  A; 

input  [3:0]  AP; 

input  [4:0]  TT; 

input  [1:0]  TC; 

input  TS_,snoop_ignore,hold,clk,HRESET_; 

output  [26:0]  CAR: 

output  [1:0]  BURSTSTART: 

output  read_flag,write_flag; 

wire  [31:0]  addressO; 

wire  [28:0]  address  1; 

wire  [26:0]  CAR; 

wire  [4:0]  TransferType; 

wire  [3:0]  addr_parity; 

wire  [1:0]  BURSTSTART,TransferCode; 

wire  car_latch,flag_reset_,hold_,ignore,latchO,latch  1  ,parity_error, 

read_flag,read_set_,TS,transaction_waiting,tw_set,tw_reset_, 

valid_op,valid_readO,valid_readl,valid_writeO,valid_writel, 

wl,w2,w3,w5,w6,w7,write_flag_,write_set_,prelatch0; 


//STAGE  0 

//Stage  0  latches 

stdinv  TSJTNV  (TS_,TS); 

stddf f  TSJLatch  (.CLK(clk),.D(TS),.Q(prelatchO)); 

stdbuf  LatchOBuffer  (,IN0(prelatch0)..Y(latch0)); 

dff  #(32,0,MAUTO","1")  AddressLatchO  (.CLK(latch0),.D(A),.Q(address0)): 

dff  #(  4,0.'AUTO","1")  AddrParityLatch  (.CLK(latchO),.D(AP),.Q(addr_parity)); 

dff  #(  5.0,' AUTO"."  1")  TransferTypeLatch  (.CLK(latchO),.D(TT),.Q(TransferType)); 

dff  #(  2,0,' AUTO","  1")  TransferCodeLatch  (.CLK(latchO),.D(TC),.Q(TransferCode)); 

stddff  IgnoreLatch  (.CLK(latchO),.D(snoop_ignore),.Q(ignore)); 
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//Odd  parity  checker 
parityo_chk32 

OddParityChecker(.D(addressO),.PIN(addr_parity),.ERROR(parity_error)); 

//Read  checker 

stdnor2  NOR_C  (TransferCode[l],TransferCode[0],wl); 

stdinv  INV_F  (TransferType[0],TransferTypeO_); 

stdand4AND_D(TransferType[3],TransferType[2],TransferType[l],TransferTypeO_, 

w2); 
stdand2  AND_E  (wl,w2,valid_read0); 

//Write  checker 

stdinv    INV_J  (TransferType[3],TransferType3_); 

stdnand2  NAND_H  (TransferType[4],TransferType[2],w3); 

stdand4  AND_G  (TransferType3_,TransferType[l],TransferTypeO_,w3,valid_writeO); 

//Transaction  checker 

stdnor2  NOR_L  (valid_write0,valid_read0,w5); 

stdnor3  NOR_M  (parity_error,w5,ignore,valid_transaction); 

//Transaction  Waiting  Latch 

stdand2  TW_Set AND  (latchO,valid_transaction,tw_set); 
stdand2  TW_ResetAND  (tw_resetl_,HRESET_,tw_reset_); 
stdlatch_c  TW_Latch 

(.D(tw_set),.CLR(tw_resetJ,.EN(latchO),.Q(transaction_waiting)); 

//STAGE  1 

//Stage  1  latches 
dff#(29,0,"AUTO","l") 

AddressLatchl  (.CLK(latchl),.D(addressO[31:3]),.Q(addressl)); 
stddffValidReadLatchl  (.CLK(latchl),.D(vaIid_readO),.Q(valid_readl)); 
stddffValidWriteLatchl(.CLK(latchl),.D(valid_writeO),.Q(valid_writel)); 

//read  and  write  flags 

stdinv  HOLD_INV  (holdjioldj; 

stdand2  FLAG_RESET_AND  (.INO(holdJ,.INl(HRESETJ,.Y(flag_resetJ); 

stddff_c  ReadFlagLatch 

(.CLK(nag_clk),.CLR(flag_resetJ,.D(valid_readl  ),.Q(read_flag )); 
stddff_c  WriteFlagLatch 

(.CLK(flag_clk),.CLR(flag_reset_),.D(valid_writel),.Q(write_flag)); 

//Current  Address  Register 
dff#(29,0,"AUTO","l") 

CA_Register(.CLK(car_latch),.D(addressl),.Q({CAR,BURSTSTART})); 


//FINITE  STATE  MACHINE 

parameter  //  epoch  enum  stat 
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IDLE  =  3'dO, 

LATCH         =  3'dl, 
OUTPUTS       =  3'd2, 
WAIT_FOR_HOLD  =  3'd3, 
WAIT_FOR_NOT_HOLD  =  3'd4, 
dc_state       =  3'bxx; 

reg  [2:0]  /*  epoch  enum  stat  */  state,  next_state; 
reg  latchl,tw_resetl_,flag_clk,car_latch; 

always  @(posedge  elk  or  negedge  HRESET_) 
begin 
if(!HRESETJ 
state  =  IDLE; 
else 
state  =  next_state; 
end 

always  @(state  or  transaction_waiting  or  hold) 
begin 

//default  values 
latchl      =  LbO; 
tw_resetl_=  l'bl; 
flag_clk   =  1'bO; 
carjatch  =  1'bO; 

case  (state) 

IDLE:  begin 

if  (transaction_waiting) 
next_state  =  LATCH; 
else  next_state  =  IDLE; 
end 

LATCH:  begin 

latchl      =  l'bl; 
tw_resetl_=  1'bO; 
if  (hold) 
next_state  =  WAIT_FOR_NOT_HOLD; 
else  next_state  =  OUTPUTS; 
end 

WAIT_FOR_NOT_HOLD:  begin 
if  (hold) 

next_state  =  WAIT_FOR_NOT_HOLD; 
else  next_state  =  OUTPUTS; 
end 

OUTPUTS:  begin 
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flag_clk  =  I'M; 
car_latch  =  l'bl; 
next_state  =  WAIT_FOR_HOLD; 

end 


WAIT_FOR_HOLD: 

begin 

if  (hold) 

next_state  =  ] 

[DLE; 

else  next_state 

=  WAIT_FOR 

HOLD 

end 

default:  begin 

next_state  =  dc 

_state; 

end 

endcase 

end 

endmodule 

1.    Thirty- Two -Input,  Odd-Parity  Checker 


*  ODD  PARITY  CHECKER 

*  Filename:  parityo_chk32.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     12FEB96 

*  Revised:  12FEB96 

* 

Purpose:  This  module  checks  the  parity  of  the  input  data,  comparing  it  to  the  input  parity.  Parity  is  odd  including 
the  parity  bit. 

module  parityo_chk32  (DJPIN,ERROR); 

input  [31:0]  D; 
input  [3:0]  PIN; 
output  ERROR; 

wire  ERROR_0,ERROR_1  ,ERROR_2,ERROR_3,ERROR; 

parityco  #(8,0,"AUTO","1") 

parity_group_0  (.D(D[  7:  0]),.PIN(PIN[0]),.ERROR(ERROR_0)); 
parityco  #(8,0,"AUTO","1") 

parity_group_l  (.D(D[15:  8]),.PIN(PIN[l]),.ERROR(ERROR_l)); 
parityco  #(8,0, '  AUTO","  1 ") 
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parity_group_2(.D(D[23:16]),.PIN(PIN[2]),.ERROR(ERROR_2)); 
parityco  #(8,0,"AUTO","1") 

panty_group_3(.D(D[31:24]),.PIN(PIN[3]),.ERROR(ERROR_3)); 

stdor4  OR_A  (ERROR_0,ERROR_1,ERROR_2,ERROR_3.ERROR); 


endmodule 


LINE    MANAGER 


LINE  MANAGER 
Filename:  line_mgr.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  20MAR9696 

Purpose:  The  function  of  this  module  is  completely  described  in  the  behavioral  model. 

This  structural  model  uses  a  high  speed  RAM  (hsram)  for  the  MRMA  List.  The  CAR  is  stored  into  this 
RAM  on  a  store  or  fetch_done  signal. 

The  predicted_ma_list  is  a  register  file  for  storing  predicted  memory  addresses.  This  list  is  composed 
of  128  address  registers,  128  equality  comparators,  and  128  Valid  status  flags.  The  NAR  is  stored  in  this  list  at 
the  fetch_done  pulse.  If  there  is  a  match  with  the  input  address  (in_addr),  a  priority  encoder  (ENC_C)  determines 
which  line  matches. 

The  line  replacement  unit  determines  the  next  line  to  be  replaced  whenever  the  PRC  needs  to  start  a  new 
line.  It  first  selects  invalid  lines.  If  all  the  lines  are  valid,  then  it  selects  lines  that  have  been  "aged".  A  priority 
encoder  (ENC_1)  choses  the  line  with  the  lowest  index  among  all  the  lines  that  can  be  replaced.  If  all  lines  are 
valid,  the  encoder's  output  enable  (oe)  signal  is  used  to  cause  aging. 

Aging  is  accomplished  by  the  use  of  a  7-bit  counter  (ager_counter),  initially  set  to  zero.  When  the 
cause_aging  signal  from  the  encoder  is  high,  the  counter  advances.  A  decoder  (DEC_B)  output  causes  the 
appropriate  Aged  flag  to  be  set. 

Changing  values  of  the  CAR  or  NAR  have  a  propogation  delay  of  25  ns  ( 1 .8  cycles)  through  the  input 
address  multiplexer  (in_addr  mux).  This  required  the  addition  of  wait  states  in  the  Controller  before  each  of  the 
tests. 
The  Revised  Controller  State  Diagram  and  Revised  Controller  State  Output  Table  show  the  required  changes. 

module  line_mgr  (CAR,NAR,HRESET_,a_select,test,fetch_done,flush,store, 
new_replace,MRMA_out,ActiveLine,line_empty,hit,clk); 
//  epoch  set_attribute  FIXEDBLOCK  =  1 

input  [26:0]  CAR,NAR; 

input  HRESET_,a_select,test,fetch_done,flush,store,new_replace,clk; 

output  [26:0]  MRMA_out; 

output  [6:0]  ActiveLine; 

output  line_empty,hit: 
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wire  [127:0]  Valid; 

wire  [26:0]  in_addr,in_addr_buf,MRMA_out; 
wire  [6:0]  ActiveLine,ReplaceLine,matchJine,wl; 
wire  MRMA_write,match,all_lines_valid,a_select_,w2,hit, 
hreset.store_,line_empty,le_set_; 

//Address  multiplexer 
mux2#(27,0,"AUTO","l") 

addrmux(.INO(CAR),.INl(NAR),.SO(a_select),.Y(in_addr)); 
buff  #(27,0,"AUTO","20")  InAddrBuffer  (.INO(in_addr),.Y(in_addr_buf)); 

//MRMAJist 

stdnor2  MRMA_NOR  (.INO(store),.INl(fetch_done),.Y(MRMA_writeJ); 

hsram  #(27, 128,7,32,1, "2") 

MRMAJist  (A(ActiveLine),.DIN(CAR),.WR(MRMA_writeJ,.DOUT(MRMA_out)); 

//PredMAJist 

//  epoch  precompiled  predicted_majist 

predicted_majist 

PredMAJist  1  (NAR,in_addr_buf,ActiveLine,fetch_done,flush,HRESET_, 
ValicLmatch  Jine^natch) ; 
andl28  all_valid_ands  (Valid,alljines_valid); 

//Line  Replacement  Unit 

//  epoch  precompiled  line_replacement_unit 

line_replacement_unit  LRU1  (Valid,  ActiveLine,all  Jines_valid, 

new_replace,fetch_done,HRESET_,clk,ReplaceLine); 

//ActiveLine  pointer 

stdbufinv  a_sdectJnv(.INO(a_sdect),.Y(a_selectJ); 

stdand2  AL_AND  (.IN0(test),.INl(a_selectJ,.Y(w2)); 

mux2#(7,0,"AUTO","l") 

al_mux(.lNO(ReplaceLine),.INl(matchJine),.SO(match),.Y(wl)); 
dff_c#(7,0,"AUTO","l") 

AcdveLineReg  (.CLK(w2),.CLR(HRESETJ,.D( w  1  ),.Q(AcUveLine)); 

//Hit  status  flag 
stdlatchhitJatch(.D(match),.EN(test),.Q(hit)); 

//line_empty  status  flag 

stdbufinv  HRESETJnv(.INO(HRESETJ,.Y(hreset)); 

stdbufinv  storeJnv(.INO(store),.Y(storeJ); 

stdnor2  LEJMOR  (.rNO(hreset),.INl(new_replace),.Y(le_setJ); 

srlatchline_eniptyJatch(.SJle_seO,.RJstoreJ,.Q(line_empty)); 

endmodule 
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1 .    Address  Register  With  Equal  Comparator 


ADDRESS  REGISTER  WITH  EQUALITY  COMPARITOR  for  PredMA  storage 

Filename:  addre.v 

Author:  Joseph  R.  Robert,  Jr. 

Date:     21DEC95 

Revised:  13FEB96 

Purpose:  This  structural  model  is  a  building  block  for  the  Predicted  Memory  Address  List  (PredMA_List).  It 
consists  of  a  single  27 -bit  register  and  an  equality  comparator.  The  output  of  the  register  is  compared  with  the 
input  address  (in_addr). 

module  addre  (NAR,in_addr,store_enable,eq,HRESET_); 

//  epoch  set_attribute  FDCEDBLOCK  =  1 

input  [26:0]  NAR,in_addr; 
input  store_enable,HRESET_; 
output  eq; 

wire  [26:0]  wl; 
wire  eq; 

dff_c  #(27,0,"AUTO",'T")  PredMA_reg  (.CLK(store_enable),.CLR(HRESETJ, 

.D(NAR),.Q(wl)); 
equal  #(27,0,' AUTO","  1")  equall  (.A(wl),.B(in_addr),.Y(eq)); 

endmodule 


AND  Gate  With  128  Inputs  and  One  Output 


128-INPUT  AND  GATE 

Filename:  andl28.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  20MAR96 

Purpose:  This  structural  model  is  a  128-input  AND  gate, 
module  and  128  (in.out); 


input  [127:0]  in; 
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output  out; 

wire  out,out_unbuffered; 

wire  [31:0]  A; 
wire  [7:0]  B; 
wire  [1:0]  C; 

and4#(32.0,"AUTO","r)  AND_A  (.IN0(in[127:96]),.INl(in[95:64]), 
IN2(in[63:32]),.IN3(in[3 1 :0]),.  Y(  A)); 

and4  #(  8,0,"AUTO","1")  AND_B  (.IN0(  A[31:24]),.IN1(  A[23:16]), 
.IN2(  A[15:8]),.IN3(  A[7:0]),.Y(B)); 

and4  #(  2,0,"AUTO","1")  AND_C  (.IN0(  B[7:6]),.IN1(  B[5:4]), 
.IN2(  B[3:2]),.IN3(  B[1:0]),.Y(C)); 

stdand2  AND_D  (.INO(C[0]),.INl(C[l]),.Y(out_unbuffered)); 

stdbuf  #("15")  OutputBuffer  (.INO(out_unbuffered),.Y(out)); 

endmodule 


Codefile  for  Seven-to-128  Decoder 

(dec7tol28e.codef ile) 


//PLA  TABLE  for  7  to  128  decoder  with  enable 
//  inO  inl  in2  in3  in4  in5  in6  EN 


00000001 

//line 

00000011 

//line 

00000101 

//line 

00000111 

//line 

00001001 

//line 

00001011 

//line 

00001101 

//line 

00001111 

//line 

00010001 

//line 

00010011 

//line 

00010101 

//line 

00010111 

//line 

00011001 

//line 

00011011 

//line 

00011101 

//line 

00011111 

//line 

00100001 

//line 

00100011 

//line 

00100101 

//  line 

00100111 

//line 

00101001 

//line 

00101011 

//line 

00101101 

//line 

00101111 

//line 
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00110001 

//line 

00110011 

//  line 

00110101 

//  line 

00110111 

//  line 

00111001 

//line 

00111011 

//line 

00111101 

//line 

001 1 1 1 1 1 

//line 

01000001 

//  line  32 

01000011 

//line 

01000101 

//line 

01000111 

//line 

01001001 

//line 

01001011 

//line 

01001101 

//line 

01001111 

//line 

01010001 

//line 

01010011 

//line 

01010101 

//line 

01010111 

//line 

01011001 

//line 

01011011 

//line 

01011101 

//line 

01011111 

//line 

01100001 

//line 

01100011 

//line 

01100101 

//  line 

01100111 

//  line 

01101001 

//  line 

01101011 

//line 

01101101 

//line 

01101111 

//line 

01110001 

//line 

01110011 

//line 

01110101 

//line 

01110111 

//line 

01111001 

//line 

01111011 

//line 

01111101 

//line 

01111111 

//  line 

10000001 

//  line  64 

10000011 

//line 

10000101 

//line 

10000111 

//line 

10001001 

//  line 

10001011 

//line 

10001101 

//line 

10001111 

//line 

10010001 

//line 

10010011 

//  line 
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10010101 

//line 

10010111 

//line 

10011001 

//line 

10011011 

//line 

10011101 

//line 

10011111 

//line 

10100001 

//line 

10100011 

//line 

10100101 

//line 

10100111 

//line 

10101001 

//line 

10101011 

//line 

10101101 

//line 

10101111 

//line 

10110001 

//  line 

10110011 

//line 

10110101 

//line 

10110111 

//line 

10111001 

//line 

10111011 

//line 

10111101 

//line 

10111111 

//line 

11000001 

//line 

11000011 

//line 

11000101 

//line 

11000111 

//line 

11001001 

//line 

11001011 

//line 

11001101 

//line 

11001111 

//line 

11010001 

//line 

11010011 

//line 

11010101 

//line 

11010111 

//line 

11011001 

//line 

11011011 

//line 

11011101 

//line 

11011111 

//line 

11100001 

//  line 

11100011 

//line 

11100101 

//line 

11100111 

//line 

11101001 

//line 

11101011 

//line 

11101101 

//line 

11101111 

//line 

11110001 

//line 

11110011 

//line 

11110101 

//line 

11110111 

//line 
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11111001  //line 

11111011  //line 

11111101  //line 

11111111  //line  128 

//END  TABLE 


4.         One-Hundred-and-Twenty-Eight-Input,         Seven-Output 
Encoder,    Priority  to  Low  Bits 


128  TO  7  ENCODER,  PRIORITY  LOW 

Filename:  encl28to71o.v 
Author:  loseph  R.  Robert,  Ir. 
Date:     21DEC95 
Revised:  13FEB96 

Purpose:  This  structural  model  is  a  128-bit  input,  7-bit  output  priority  encoder.  The  highest  priority  is  given  to 
the  bit  with  the  lowest  index.  Inputs  and  outputs  are  active  high.  It  is  composed  of  four  32  to  5  priority  encoders 
and  the  logic  gates  necessary  to  connect  them  together. 

module  encl28to71o  (I,A,ei,eo,gs); 

//  epoch  set_attribute  FDCEDBLOCK  =  1 

input  [127:0]  I; 
input  ei; 
output  [6:0]  A; 
output  gs,eo; 

wire  [4:0]  gOA,glA,g2A,g3A; 

wire  g3eo,g2eo,g  1  eo,g3gs,g2gs,g  1  gs,g0gs,eo,gs; 

enc32to51o  ENCg3  (I[127:96],g3A,g2eo,  eo,g3gs); 
enc32to51o  ENCg2  (I[  95:64],g2A,gleo,g2eo,g2gs); 
enc32to51oENCgl  (I[  63:32],glA,g0eo,gleo,glgs); 
enc32to51o  ENCgO  (I[  31:  0],g0A,  ei.gOeo.gOgs); 

//Group  Select 

stdor4  OR_A  (g3gs,g2gs,glgs,g0gs,gs); 

//A6 

stdor2  OR_B  (g3gs,g2gs,A[6]); 

//A5 

stdor2  OR_C  (g3gs,glgs,A[5]); 
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//A4  -  AO 

stdor4  ORJD  (g0A[4],glA[4],g2A[4],g3A[4],A[4]); 
stdor4  OR_E  (gOA[3],glA[3],g2A[3],g3A[3],A[3]); 
stdor4  OR_F  (g0A[2],glA[2],g2A[2],g3A[2],A[2]); 
stdor4  OR_G  (gOA[l],glA[l],g2A[l],g3A[l],A[l]); 
stdor4  OR_H  (gOA[0],glA[0],g2A[0],g3A[0],A[0]); 

endmodule 


5.         Thirty-Two -Input,    Five-Output    Encoder,    Priority   to 
Low  Bits 


32  TO  5  ENCODER,  PRIORITY  LOW 

Filename:  enc32to51o.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  13FEB96 

Purpose:  This  structural  model  is  a  32-bit  input,  5-bit  output  priority  encoder.  The  highest  priority  is  given  to  the 
bit  with  the  lowest  index.  Inputs  and  outputs  are  active  high. 

This  module  is  a  composed  of  four  8  to  3  priority  encoders  and  the  logic  gates  necessary  to  connect  them 
together.  This  module  is  a  building  block  for  the  128  to  7  priority  encoder. 

module  enc32to51o  (i,A,ei,eo,gs); 

//  epoch  set_attribute  FDCEDBLOCK  =  1 

mput  [31:0]  i; 
input  ei; 
output  [4:0]  A; 
output  gs,eo; 

wire  [2:0]  g0A,glA,g2A,g3A; 

wire  g3eo,g2eo,g  1  eo,g3gs,g2gs,g  1  gs,gOgs,eo,gs; 

enc8to31o  ENCg3  (i[31:24],g3A,g2eo,  eo,g3gs); 
enc8to31o  ENCg2  (i[23:16],g2A,gleo,g2eo,g2gs); 
enc8to31oENCgl  (i[15:  8],glA,g0eo,gleo,glgs); 
enc8to31o  ENCgO  (i[  7:  0],gOA,  ei,g0eo,g0gs); 


//Group  Select 

stdor4  OR_A  (g3gs,g2gs,glgs,g0gs,gs); 

//A4 

stdor2  OR_B  (g3gs,g2gs,A[4]); 
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//A3 

stdor2  OR_C  (g3gs,glgs,A[3]); 

//A2  -  AO 

stdor4  OR_D  (g0A[2],glA[2],g2A[2],g3A[2],A[2]); 
stdor4  OR_E  (gOA[l],glA[l],g2A[l],g3A[l],A[l]); 
stdor4  OR_F  (gOA[0],glA[0],g2A[0],g3A[0],A[0]); 

endmodule 


6.         Eight-Input,    Three-Output   Encoder,    Priority  to   Low 
Bits 


8  TO  3  ENCODER,  PRIORITY  LOW 

Filename:  enc8to31o.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  13FEB96 

Purpose:  This  structural  model  is  an  8-bit  input,  3-bit  output  priority  encoder.  The  highest  priority  is  given  to  the 
bit  with  the  lowest  index.  Inputs  and  outputs  are  active  high. 

Truth  table 

Inputs  Outputs 

EI  17  16 15  14  13  12  II 10   A2  Al  AO  GS  EO 

Oxxxxxxxx     00000 


1  10  0  0  0  0  0  0 

11110 

1x10  0  0  0  0  0 

110  10 

1  x  x  1  0  0  0  0  0 

10  110 

1  x  x  x  1  0  0  0  0 

10  0  10 

1  x  x  x  x  1  0  0  0 

0  1110 

lxxxxxlOO 

0  10  10 

1  X  X  X  X  X  X  1  0 

0  0  110 

lxxxxxxxl 

0  0  0  10 

100000000 

0  0  0  0  1 

module  enc8to31o  (I,A,EI,EO,GS); 

//  epoch  set_attnbute  FIXEDBLOCK  =  1 

input  [7:0]  I; 
input  EI; 
output  [2:0]  A; 
output  GS,EO; 
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wire  [7:0]  I_; 

wire  [2:0]  A; 

wire  EI_,GS,EO,EO_; 

supply  1  VDD; 

//Standard  cell  implemenation  is  more  efficient  here.  See  User  Man.  5-34. 
inv  #(8,0,"AUTO"."1")  INV1  (.IN0(I),.Y(IJ); 
stdinv  INV_AA  (.INO(ED,.Y(EIJ); 

//Enable  Output 
stdand4  AND_A  (EU_[7],I_[6],  I_[5],wl); 
stdand4  AND_B  ( I_[4],I_[3],  I_[2],  I_[l],w2); 
stdand3  AND_C  (I_[0],wl,w2JEO); 

//Group  Select  ("Got  Something") 
stdnor2  NOR_D  (EI_EO,GS); 

//Encode  A2  =  EI.(I7.I6_.I5_.I4_.I3_.I2_.I1_.I0_  +  I6.I5_.I4_.I3_.I2_.I1_.I0_  + 
//  I5.I4_.I3_.I2_.I1_.I0_+  I4.I3_.I2_.I1_.I0J 

//  =  ELCI7.I3_J2_.I1_.I0_  +  I6.I3_.I2_.I1_.I0_  + 

//  I5.I3_.I2_.I1_.I0_  +  I4.I3_.I2_.I1_.I0J 

//  =  EI.I3_.I2_.I1_.I0_.(I7  +  16.  +  15.  +  14.) 

stdor4  OR_E    (I[7],I[6],I[5],I[4],w5); 
stdand4  AND_F  (EIJ_[3]J_[2],  I_[l],w6); 
stdand3  AND_FA  (I_[0],w5,w6,A[2]); 

//Encode  Al  =  EI.(I7.I6_.I5_.I4_.I3_.I2_.I1_.I0_  +  I6.I5_.I4_.I3_.I2_.I1_.I0_  + 

//  I3.I2_.I1_.I0_  +  I2.I1_.I0_) 

//  =  EI.I1_.I0_.(I7.I5_.I4_.  +  I6.I5_.I4_  +  13.  +  12) 


stdand3  AND_G  (I[7],I_[5],I_[4],wlO); 
stdand3  AND_H  (I[6],I_[5],I_[4],wll); 
stdor4  OR_I  (wlO,wll,I[3],I[2],wl2); 
stdand4  AND_J  (EIJ_[l],I_[0],wl2,A[l]); 

//Encode  A0  =  EI.(I7.I6_.I5_.I4_.I3_.I2_.I1_.I0_  +  I5.I4_.I3_.I2_.I1_.I0_  + 

//  I3.I2_.I1_.I0_  +  I1.I0J 

//  =  EI.I0_.(I7.I6_.I4_.I2_  +  I5.I4_.I2_  +  I3.I2_  +  II) 

stdand4  AND_K  (I[7].I_[6]J_[4]J_[2],wl5): 
stdand3  AND_L  (I[5],I_[4],I_[2],wl6); 
stdand2  AND_M  (I[3],I_[2],wl7); 
stdor4  OR_N  (wl5,wl6,wl7,I[l],wl8); 
stdand3  AND_P  (EL  I_[0],wl8,A[0]); 

endmodule 
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Line  Replacement  Unit 


LINE  REPLACEMENT  UNIT 
Filename:  line_replacement_unit.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  13FEB96 

Purpose:  This  structural  model  determines  the  next  line  to  be  replaced  whenever  the  PRC  needs  to  start  a  new 
line.  It  first  selects  invalid  lines.  If  all  the  lines  are  valid,  then  it  selects  lines  that  have  been  "aged".  A  priority 
encoder  (ENC_1)  choses  the  line  with  the  lowest  index  among  all  the  lines  that  can  be  replaced.  If  all  lines  are 
valid,  the  encoder's  output  enable  (oe)  signal  is  used  to  cause  aging.  A  line  X  can  be  replaced  if  the  following  holds 
true  for  that  line: 

not  (X=ActiveLine)  AND  {not  Valid[X]  OR  (all_lines_vahd  AND  Aged[X])} 

Aging  is  accomplished  by  the  use  of  a  7-bit  counter  (ager_counter),  initially  set  to  zero.  When  the 
cause_aging  signal  from  the  encoder  is  high,  the  counter  advances.  A  decoder  (DEC_B)  output  causes  the 
appropriate  Aged  flag  to  be  set. 

module  line_replacement_unit(  Valid, ActiveLine,all_lines_valid, 

new_replace,fetch_done,HRESET_,CLK,ReplaceLine); 
//  epoch  set_attribute  FDCEDBLOCK  =  1 

input  [127:0]  Valid; 

input  [6:0]  ActiveLine; 

input  all_lines_validjiew_replace,fetch_done,HRESET_,CLK; 

output  [6:0]  ReplaceLine; 

supply  1  Vdd; 

wire  [127:0]  wl,w2,w4,w5,w6,w7,set_,reset_,Aged,fetch_donel28, 

all_lines_validl28,HRESET128_; 
wire  [6:0]  ager_line,ReplaceLine,HRESET7_; 
wire  ager_en,cause_aging,latch_en,latch_en_buf,ncl,nc2; 
splitl28  fetch_done_split  (fetch_done,fetch_donel28); 
splitl28  alv_split  (all_lines_valid,all_lines_validl28); 
splitl28  HRESET_split  (HRESET_,HRESET128J; 
split7  HRESET_split7  (HRESET_,HRESET7_); 

decoder  #(8,128,"verilog/dec7tol28e.codefile","2") 
DEC_A  (.SEL((  Vdd,ActiveLine[0],ActiveLine[l],ActiveLine[2], 
ActiveLine[3],ActiveLine[4],ActiveLine[5],ActiveLine[6](), 
-Y(wl)); 
decoder_inv#(8,128,"verilog/dec7tol28e.codefile"."2") 
DEC_B  (.SEL({  Vdd,ager_line[0],ager_line[l],ager_line[2], 

ager_line[3],ager_line[4],ager_line[5],ager_line[6]}),.YBAR(setJ); 
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nand2#(128,0,"AUTO","l")NAND_A(.IN0(wl),.INl(fetch_donel28),.Y(w2)); 
and2  #(  128,0," AUTO","  1")  AND_E  (.IN0(w2),.INl(HRESET128J,.Y(resetJ); 
srlatchl28  AgedReg  (set_,reset_,Aged); 
scntr_c  #(7,0,"AUTO","1") 

ager_counter(.CLK(ager_en),.CLR(HRESET_),.EN(Vdd),.COUT(nc2), 
.Q(ager_line)); 
stdand2  AND_F  (.INO(CLK),.INl(cause_aging),.Y(ager_en)); 
nand2  #(128,0,"AUTO","1") 

NAND_B(-IN0(all_lines_validl28),.INl(Aged),.Y(w4)); 
and2  #(128,0,"AUTO","1")  AND_C  (.IN0(w4),.INl(Valid)..Y(w5)); 
nor2  #(  128,0," AUTO","  1")  NOR_D  (.INO(wl),.INl(w5),.Y(w6)); 
stdor2  OR_F  (.INO(new_replace),.INl(cause_aging),.Y(latch_en)); 
stdbuf  #("19")  LatchEnableBuffer  (.INO(latch_en),.Y(latch_en_buf)); 
encl28to71o  ENC1  (.I(w7),.A(ReplaceLine),.ei(Vdd). 

.eo(cause_aging),.gs(ncl)); 
latch_c  #(128,0,"AUTO","1") 

ReplaceLineLatch(.EN(latch_en_buf),.CLR(HRESETJ,.D(w6),.Q(w7)); 


endmodule 


OR  Gate  With  128  Inputs,  One  Output 


128-INPUT  OR  GATE 
Filename:  orl28.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  23JAN96 


**  ***  ************************************************************  *********** 

module  orl28  (in,out);  //An  OR  tree,  equivalent  to  a  128-input  OR  gate, 
input  [127:0]  in; 
output  out; 
wire  out; 

wire  [31:0]  A; 
wire  [7:0]  B; 
wire  [1:0]  C; 

or4#(32,0,"AUTO","l")  OR_A  (.IN0(in[127:96]),.INl(in[95:64]), 

.IN2(in[63:32]),.IN3(in[31:0]),.Y(A)); 
or4#(  8,0,"AUTO","1")  OR_B  (.IN0(A[31:24]),.rNl(A[23:16]), 

.IN2(A[15:8]),.IN3(A[7:0]),.Y(B)); 
or4  #(  2,0,"AUTO","1")  OR_C  (.IN0(B[7:6]),.IN1(B[5:4]), 

.IN2(B[3:2]),.IN3(B[1 :0]),.Y(C)); 
stdor2  OR_D  (.IN0(C[l]),.INl(C[0]),.Y(out)); 


/ 
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endmodule 


9 .         Predicted  Memory  Address   List 


PREDICTED  MEMORY  ADDRESS  LIST 

Filename:  predmajist.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  13FEB96 

Purpose:  This  structural  model  is  a  register  file  for  storing  predicted  memory  addresses.  This  list  is  composed  of 
128  address  registers,  128  equality  comparators,  and  128  Valid  status  flags.  The  NAR  is  stored  in  this  list  at  the 
fetch_done  pulse.  If  there  is  a  match  with  the  input  address  (in_addr),  a  priority  encoder  (ENC_C)  determines 
which  line  matches. 

module  predicted_ma_list  (NAR,in_addr,ActiveLine,fetch_done.flush,HRESET_, 
Valid,match_line,match); 
//  epoch  set_attribute  FDCEDBLOCK  =  1 

input  [26:0]  NAR,  in_addr; 

input  [6:0]  ActiveLine; 

input  fetch_done,flush,HRESET_; 

output  [127:0]  Valid; 

output  [6:0]  matchjine; 

output  match; 

wire  [127:0]  store_en,store_en_buf,flush_enable_,set_,reset_, 

Valid,equal,m,HRESET128_; 
wire  ncl,nc2; 
supplyl  Vdd; 

split  128  hreset_sphtter  (.in(HRESET_),.out(HRESET128J); 
decoder  #(8,128,"verilog/dec7tol28e.codefile","2") 

DEC_A  (.SEL(  |fetch_done,ActiveLine[0],ActiveLine[l],ActiveLine[2], 
AcUveLine[3],AcdveLine[4],ActiveLine[5],ActiveLine[6]}), 
.Y(store_en)); 
buff  #(  128,0," AUTO","8")  StoreEnBuffer  (.IN0(store_en),.Y(store_en_buf)); 
decoder_inv  #(8,128,"verilog/dec7tol28e.codefile","2") 

DEC_B  (.SEL({ flush,ActiveLine[0],ActiveLine[l],ActiveLine[2], 
ActiveLine[3],ActiveLine[4],ActiveLine[5],ActiveLine[6]}), 
.YBAR(flush_enableJ): 
inv  #(  128,0," AUTO","  1")  INV_A  (.IN0(store_en_buf)..Y(setJ); 
and2#(  128,0,"  AUTO","  1") 

AND_B(.IN0(flush_enable_),.INl(HRESET128J,.Y(resetJ); 
srlatchl28  Validjatch  (.S_(set_),.R_(reset_),.Q(Valid)); 
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and2#(  128,0,"  AUTO","  1") 

AND_C(.INO(Valid),.INl(equal),.Y(m)); 
orl28  MATCH_OR  (.in(m),.out(match)); 
encl28to71o  ENC_C  (.I(m),.A(match_line),.ei(Vdd),.eo(nclI 


•gs(nc2)); 


//  epoch  precompiled  addre 

addre  PredMAO  (NAR,in_addr,store_en_buf[0],equal[0],HRESETJ 
addre  PredMAl  (NAR,in_addr,store_en_buf[l],equal[l],HRESETJ 
addre  PredMA2  (NAR,in_addr,store_en_buf[2],equal[2],HRESET_) 
addre  PredMA3  (NAR.in_addr,store_en_buf[3],equal[3],HRESET_) 
addre  PredMA4  (NAR,in_addr,store_en_buf[4],equal[4],HRESET_) 
addre  PredMA5  (NAR,in_addr,store_en_buf[5],equal[5],HRESET_) 
addre  PredMA6  (NAR,in_addr,store_en_buf[6],equal[6],HRESET_) 
addre  PredMA7  (NAR,in_addr,store_en_buf[7],equal[7],HRESET_) 
addre  PredMA8  (NAR,in_addr,store_en_buf[8],equal[8],HRESET_) 
addre  PredMA9  (NAR,in_addr,store_en_buf[9],equal[9],HRESETJ 


addre  PredMA  10 
addre  PredMAl  1 
addre  PredMA  12 
addre  PredMA  13 
addre  PredMA  14 
addre  PredMA  15 
addre  PredMA  16 
addre  PredMA  17 
addre  PredMA  18 
addre  PredMA  19 
addre  PredMA20 
addre  PredMA21 
addre  PredMA22 
addre  PredMA23 
addre  PredMA24 
addre  PredMA25 
addre  PredMA26 
addre  PredMA27 
addre  PredMA28 
addre  PredMA29 
addre  PredMA30 
addre  PredMA31 
addre  PredMA32 
addre  PredMA33 
addre  PredMA34 
addre  PredMA35 
addre  PredMA36 
addre  PredMA37 
addre  PredMA38 
addre  PredMA39 
addre  PredMA40 
addre  PredMA41 
addre  PredMA42 
addre  PredMA43 


(N  AR,in_addr,store_en_buf  [  1 0]  ,equal  [  1 0] ,  HRESET J 

(NAR,in_addr,store_en_buf[l  l],equal[l  1],  HRESET. 

(NAR,in_addr,store_en_buf  [  1 2],equal  [  1 2],HRESET. 

(NAR,in_addr,store_en_buf[13],equal[13],HRESET. 

(NAR,in_addr,store_en_buf[14],equal[14],HRESET. 

(NAR,in_addr,store_en_buf[15],equal[15],HRESET. 

(NAR,in_addr,store_en_buf  [  1 6],equal  [  1 6],HRESET J 

(NAR,in_addr,store_en_buf  [  1 7],equal  [  17],HRESET. 

(N  AR,in_addr,store_en_buf  [  1 8],equal  [  1 8],HRESET. 

(NAR,in_addr,store_en_buf[19],equal[19],HRESET. 

(NAR,in_addr,store_en_buf[20],equal[20],HRESETJ 

(NAR,in_addr,store_en_buf[21],equal[21],HRESET 

(NAR,in_addr,store_en_buf[22],equal[22],HRESET. 

(NAR,in_addr,store_en_buf[23],equal[23],HRESET. 

(N  AR,in_addr,store_en_buf  [24],  equal  [24],HRESET. 

(NAR,in_addr,store_en_buf[25],equal[25],HRESET. 

(NAR,in_addr,store_en_buf[26],equal[26],HRESET_ 

(NAR,in_addr,store_en_buf[27],equal[27],HRESET_ 

(NAR,in_addr,store_en_buf[28],equal[28],HRESET. 

(NAR,in_addr,store_en_buf[29],equal[29],HRESET. 

(NAR,in_addr,store_en_buf[30],equal[30],HRESETJ 

(NAR,in_addr,store_en_buf[31],equal[31],HRESET. 

(NAR,in_addr,store_en_buf[32],equal[32],HRESET_ 

(NAR,in_addr,store_en_buf[33],equal[33],HRESET. 

(NAR,in_addr,store_en_buf[34],equal[34],HRESET. 

(NAR,in_addr,store_en_buf[35],equal[35],HRESET. 

(NAR,in_addr,store_en_buf[36],equal[36],HRESET. 

(NAR,in_addr,store_en_buf[37],equal[37],HRESET_ 

(NAR,in_addr,store_en_buf[38],equal[38],HRESET. 

(NAR,in_addr,store_en_buf[39],equal[39],HRESETJ 

(NAR,in_addr,store_en_buf[40],equal[40],HRESETJ 

(N  AR,  in_addr,  store_en_buf  [4 1  ]  ,equal  [4 1  ]  ,HRESET_ 

(NAR,in_addr,store_en_buf[42],equal[42],HRESET. 

(NAR,in_addr,store_en_buf[43],equal[43],HRESET. 
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],HRESETJ; 

],HRESET_) 

],HRESET_) 

],HRESET_ 

],HRESET_ 

],HRESETJ; 

],HRESET_) 

],HRESET_ 

],HRESET_ 

],HRESET_ 

],HRESET_ 

],HRESETJ; 

].HRESET_ 

],HRESET_) 

],HRESETJ 

l.HRESETJ; 

],HRESET_ 

],HRESET_ 

],HRESET_ 

],HRESET_) 

],HRESET_ 

],HRESET_ 

J,HRESETJ; 

],HRESET_ 

],HRESETJ; 

l.HRESETJ; 

],HRESET_ 

],HRESET_ 

],HRESETJ 

],HRESET_ 

],HRESET_ 

],HRESET_ 

].HRESET_ 

],HRESET_ 
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],HRESETJ; 

],HRESETJ; 

],HRESETJ; 

],HRESETJ; 

],HRESETJ; 

],HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 

l.HRESETJ; 
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addre  PredMA94  (NAR,in_addr,store_en_buf[94],equal[94 
addre  PredMA95  (NAR,in_addr,store_en_buf[95],equal[95 
addre  PredMA96  (NAR,in_addr,store_en_buf[96],equal[96 
addre  PredMA97  (NAR,in_addr,store_en_buf[97],equal[97 
addre  PredMA98  (NAR.in_addr,store_en_buf[98],equaI[98 
addre  PredMA99  (NAR,in_addr,store_en_buf[99],equal[99 
addre  PredMAlOO  (NAR,in_addr,store_en_buf[  100], equal 
addre  PredMAlOl  (NAR,in_addr,store_en_buf[  101], equal 
addre  PredMA102  (NAR,in_addr,store_en_buf[  102], equal 
addre  PredMA103  (NAR,in_addr,store_en_buf[  103], equal 
addre  PredMA104  (NAR,in_addr,store_en_buf[104],equal 
addre  PredMA105  (NAR,in_addr,store_en_buf[105],equal 
addre  PredMA106  (NAR,in_addr,store_en_buf[106],equal 
addre  PredMA107  (NAR,in_addr,store_en_buf[107],equal 
addre  PredMA108  (NAR,in_addr,store_en_buf[108],equal 
addre  PredMA109  (NAR,in_addr,store_en_buf[109].equal 
addre  PredMAl  10  (NAR,in_addr,store_en_buf[l  10], equal 
addre  PredMAl  1 1  (NAR,in_addr,store_en_buf[l  1 1], equal 
addre  PredMAl  12  (NAR,in_addr,store_en_buf[  112], equal 
addre  PredMAl  13  (NAR,in_addr,store_en_buf[l  13], equal 
addre  PredMAl  14  (NAR,in_addr,store_en_buf[l  14],equal 
addre  PredMAl  15  (NAR,in_addr,store_en_buf[l  15], equal 
addre  PredMAl  16  (NAR,in_addr,store_en_biif[l  16],equal 
addre  PredMAl  17  (NAR,in_addr,store_en_buf[117],equal 
addre  PredMAl  18  (NAR,in_addr,store_en_buf[118],equal 
addre  PredMAl  19  (NAR,in_addr,store_en_buf[l  19],equal 
addre  PredMA120  (NAR,in_addr,store_en_buf[120],equal 
addre  PredMA121  (NAR,in_addr,store_en_biif[121],equal 
addre  PredMA122  (NAR,ui_addr,store_en_buf[122],equal 
addre  PredMA123  (NAR,in_addr,store_en_buf[123],equal 
addre  PredMA124  (NAR,in_addr,store_en_buf[124],equal 
addre  PredMAl 25  (NAR,in_addr,store_en_buf[  125], equal 
addre  PredMA126  (NAR,in_addr,store_en_buf[126],equal 
addre  PredMAl 27  (NAR,in_addr,store_en_buf[  127], equal 
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.HRESETJ; 

.HRESETJ; 

.HRESETJ; 

.HRESETJ; 

.HRESETJ; 

00].HRESETJ 

01],HRESET_ 

02],HRESET_ 

03],HRESET_ 

04],HRESET_ 

05],HRESET_ 

06],HRESET_ 

07],HRESET_ 

08],HRESET_ 

09].HRESET_ 

10],HRESETJ 

11],HRESET_ 

12],HRESET_ 

13],HRESET_ 

14],HRESET_ 

15],HRESET_ 

16],HRESET. 

17],HRESET. 

18],HRESET_ 

19],HRESETJ 

20], HRESETJ 

21],HRESET_ 

22],HRESET_ 

23],HRESET_ 

24],HRESET_ 

25],HRESET_ 

26],HRESET_ 

27],HRESET_ 


endmodule 
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One-to-128  Wire  Splitter 


1  TO  128  WIRE  SPLITTER 
Filename:  splitl28.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  23JAN96 

Purpose:  Splits  a  wire  into  128  wires. 
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module  splitl28  (in,out);  //Splits  a  wire  into  128  wires, 
input  in; 
output  [127:0]  out; 


assign  out  =  {in,ui,ui,ui,in,in,m,in,in,in,in,in,in,in,in,in. 
m,m,m,m,in,m,m,m,m,m,m,m,m,in,in,in, 
in,mjn,in,m,m,m,m,m,m,in,in,in,in,in,in, 
m,in,in,m,in,in,m,m.in,in,m,in,m,m,m,in, 
injn,m,in,in,m,mjn,m,in,in,in,in,in,in,in, 
m,m,m,in,m,m,m,m,m,m,m,m,m,m,in,in, 
m,m,m,m,m,m,m,in,m,m,m,m,m,m,m,in, 
in,in,in  ,in,m,m,m,m,m,m,m,m,in,in,in,in } ; 
endmodule 


11.   One-to-Seven  Wire  Splitter 


1  TO  7  WIRE  SPLITTER 
Filename:  split7.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  23JAN96 


Purpose:  Splits  a  wire  into  7  wires. 

module  split7  (in.out);  //Splits  a  wire  into  7  wires, 
input  in; 
output  [6:0]  out; 

assign  out  =  {in,in,in,in,in,in,in}; 

endmodule 

12.      Set,    Reset   Latch 


STANDARD  SET.RESET  LATCH 

Filename:  srlatch.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  23JAN96 
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Reset  has  priority. 

****************************************************************** 
module  srlatch  (S_,  R_,  Q); 

input  S_,R_; 

output  Q; 

wire  wl,w2,Q,Q_; 

stdnand2  NAND_A  (.IN0(w2),.INl(Q_),.Y(Q)); 
stdnand2  NAND_B  (.IN0(R_),.IN1(Q),.Y(QJ); 
stdnand2  NAND_C  (.IN0(wl),.INl(R_),.Y(w2)); 
stdinv  INV_D  (.IN0(SJ,.Y(wl)); 

endmodule 


13.       Set,    Reset   Latch  Array   128   Bits   Wide 


ARRAY  OF  128  SET,RESET  LATCHES 

Filename:  srlatchl28.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
Revised:  23JAN96 


module  srlatch  128  (S_,  R_,  Q);  //set-reset  latch 
input  [127:0]  S_,R_; 
output  [127:0]  Q; 
wire  [127:0]  wl,w2,Q,Q_; 

nand2#(128,0,"AUTO","l")  NAND_A  (.IN0(w2),.INl(QJ,.Y(Q)); 
nand2  #(128,0,"AUTO","1")  NAND_B  (.IN0(RJ,.IN1(Q),.Y(QJ); 
nand2  #(128,0, "AUTO'Vl")  NAND_C  (.IN0(wl),.INl(RJ,.Y(w2)); 
inv    #(128,0,'AUTO","1")INV_C  (.IN0(SJ,.Y(wl)); 

endmodule 


PREDICTOR 


/****************************************************************************** 

PREDICTOR 

Filename:  predictor.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     21DEC95 
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Revised:  06FEB96 

Purpose:  This  module  calculates  the  Next  Address  (stored  in  NAR)  based  on  the  Most  Recent  Memory  Access 
(MRMA)  and  the  Current  Address  (in  the  CAR).  The  prediction  calculation  is 

NAR  =  2*CAR  -  MRMA 

hi  this  structural  implementation  of  the  Predictor,  the  predict  signal  latches  in  the  CAR  and  MRMA  inputs.  The 
subtraction  is  accomplished  as  a  2's  compliment  addition  with  a  high  speed  adder. 

The  CAR  is  multiplied  times  2  by  concatenating  a  zero  at  the  least  significant  end.  The  most  significant  bit  of  the 
CAR  is  not  retained,  since  it  will  not  have  an  effect  on  the  27 -bit  output  of  the  adder.  This  would  adversely  affect 
address  prediction  only  around  the  mid-point  of  the  4  gigabytes  of  memory.  The  Golden  Rule  here  is  "Design  for 
the  common  case." 

A  number  is  negated  in  2's  compliment  by  inverting  all  the  bits  and  adding  1 .  The  MRMA  is  negated  by  inverting 
all  its  bits.  Adding  the  required  1  is  implemented  as  a  Carry-In  to  the  adder. 

Epoch's  TACTIC  reported  the  propagation  delay  from  predict  to  NAR  to  be  4.90  ns. 

module  predictor  (MRMA,CAR,predict,NAR,HRESETJ; 

//CAR  is  [30:5]  of  32-bit  address 

//MRMA  and  NAR  are  [31:5]  of  32-bit  address 

//  epoch  set_attribute  FIXEDBLOCK  =  1 

input  [26:0]  MRMA; 
input  [25:0]  CAR; 
input  predict,HRESET_; 
output  [26:0]  NAR; 

wire  [26:0]  NAR,A.B,C; 
wire  nc; 

"define  group  "predictor" 

supplyO  gnd; 
supplyl  vdd; 

assign  A[0]  =  gnd; 
dff_c  #(26,  i;  group,"  1") 

CARJatch(.D(CAR),.CLK(predict),.CLR(HRESETJ,.Q(A[26:l])); 

dff_c#(27,i;group,"l") 

MRMAJatch(.D(MRMA),.CLK(predict),.CLR(HRESETJ..Q(Q); 

bufinv  #(27,1/ group,"  1  ","speed") 

bit_complement  (.IN0(C),.Y(B)); 
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addhs#(27,l/group,"l") 

adder  (.A(A),.B(B),.CIN(vdd),.COUT(nc),.SUM(NAR)); 

endmodule 


DATA    LIST 


DATA  LIST 
Filename:  datalist.v 
Author:  Joseph  R.  Robert,  Jr. 
Date:     15DEC95 
Revised:  07FEB96 

Purpose:  This  module  stores  the  data  retreived  from  memory  in  anticipation  of  a  request  by  the  CPU. 

The  basic  memory  cell  is  Epoch's  hsramoe  (high  speed  ram  with  output  enable).  Since  each  hsram  has 
a  maximum  word  size  of  128  bits,  mere  are  two  hsram  parts  in  parallel  to  get  the  required  256-bit  width. 

An  upload  signal  causes  the  Data  List  to  store  the  data  on  data_line  into  the  address  specified  by 
ActiveLine.  The  input  upload  has  to  be  inverted  to  match  the  active-low  WR  input  of  the  Epoch  hsram  component. 

A  download  signal  causes  the  Data  List  to  assert  onto  datajine  the  data  in  the  address  specified  by 
ActiveLine.  This  signal  also  has  to  be  inverted  for  the  same  reason. 

Both  the  inverters  can  probably  be  removed  if  the  Bus  Interface  Unit  makes  the  upload  and  download 
signals  active  low.  That  could  only  improve  the  response  time  of  this  data  memory. 

Epoch  calculated  the  following  timing  delays: 

download  ->  hsramoe.DOUT  2.3  ns 
ActiveLine  ->  hsramoe.DOUT  7.3  ns 

A  design  alternative  is  to  use  the  regular  speed  version,  ramoe,  with  the  following  timing  delays. 

download  ->  ramoe.DOUT  4  ns 
ActiveLine  ->  ramoe.DOUT  16  ns 

Using  this  slower  RAM  is  possible,  but  would  require  a  significant  modification  to  the  PRC  behavior  to  handle  to 
longer  delay,  and  would  add  a  cycle  delay  to  CPU  reads  when  there  is  a  hit  in  the  PRC. 

Putting  this  module's  VerilogOut  file  into  the  original  PRC  behavioral  model  for  mixed-mode  simulation 
caused  a  timing  error  that  had  to  be  corrected  in  the  Bus  Interface  Unit.  After  an  upload  to  the  DataList,  datajine 
must  remain  valid  for  long  enough  to  meet  the  data  hold  time  requirement  of  Epoch's  hsramoe. 

****************************  ****************  ^^^^^^^^^^^^^5^3ic5fcsfc^c3jcsfc3ic^:5}c>fc=f:3}c3(c^:4;^;3f:^C5ic/ 

module  datalist  (data_line,AcdveLine,upload,download); 
//  epoch  set_attribute  FIXEDBLOCK  =  1 

input  [6:0]  ActiveLine; 
input  upload.download; 
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inout  [255:0]  datajine; 

wire  [255:0]  datajine; 
wire  write_,enable_; 

//STRUCTURE 

stdbufinv  upload_inv  (.INO(upload),.Y(write_)); 

stdbufinv  downioad_inv  (.ENO(download),.Y(enable_)); 

hsramoe  #(128, 128,7,32, 1,'T") 
data_raml  (.A(ActiveLine),.DIN(data_line[127:0]),.DOUT(data_line[127:0]), 
.WR(writeJ,.OE(enableJ); 
hsramoe  #(128, 128,7,32, 1,"1") 
data.ramO  (.A(ActiveLine),.DIN(data_line[255: 128]),.DOUT(data_line[255: 128]), 
.WR(writeJ,.OE(enable_)); 

endmodule 

G.    BUS  INTERFACE 


*  BUS  INTERFACE  UNIT 

*  Filename:  bus_interface.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     09OCT95 

*  Revised:  20MAR96 

Purpose:  This  module  connects  the  PRC  with  the  system  bus.  It  handles  the  protocol  of  data  transfer  in  and  out 
of  the  PRC. 

When  this  module  receives  a  fetch  signal,  it  latches  the  address  in  the  NAR,  and  requests  the  bus  for  a 
burst  read.  It  stores  the  incoming  data  until  all  four  bursts  have  been  received.  Then  it  uploads  the  data  into  the 
Data  List  and  assserts  fetch_done.  If  there  is  a  parity  error  during  the  fetch,  the  Bus  Interface  informs  the  Controller 
by  asserting  fetch_abort,  and  the  transaction  is  cancelled. 

When  this  module  receives  a  send  signal,  it  sends  a  cancel  signal  (CANX)  to  the  memory  module, 
downloads  data  from  the  Data  List,  and  then  sends  the  data  to  the  CPU.  When  the  transfer  is  finished,  it  asserts 
send_done. 

The  coordination  of  these  activities  is  accomplished  through  the  use  of  two  Finite  State  Machines.  One 

acts  as  an  address  bus  master,  and  the  other  controls  the  flow  of  data. 

* 

module  bus_interface  (NAR_IN,BURSTSTART,BG_,AACK_,DBG_,send,fetch, 
clk,BR_,upload,download,fetch_done,fetch_abort, 
send_done,CANX,snoop_ignore,DATALINE,D,A,AP,DP,DPE_, 
TT,TSIZ,TC,ABB_,TS_,TBST_,DBB_,TA_,HRESETJ; 
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//  epoch  set_attribute  FIXEDBLOCK  =  1 

//  Signals  are  defined  in  system.v. 

input  [26:0]  NARJN; 

input  [1:0]  BURSTSTART; 

input  BG_,AACK_,DBG_,send,fetch,clk,HRESET_; 

output  BR_,upload,downJoad,fetch_done,fetch_abort; 

output  send_done,DPE_,CANX,snoop_ignore; 

inout  [255:0]  DAT  ALINE; 

inout  [63:0]  D; 

inout  [31:0]  A; 

inout  [7:0]  DP; 

inout  [4:0]  TT; 

inout  [3:0]  AP; 

inout  [2:0]  TSIZ; 

inout  [1:0]  TC; 

inout  ABB_,TS_,TBST_,DBB_,TA_; 

tri  [255:0]  DATALINE; 

tri  [63:0]  D; 

tri  [3 1:0]  A; 

tri  [7:0]  DP; 

tri  [4:0]  TT; 

tri  [3:0]  AP; 

tri  [2:0]  TSIZ; 

tri  [1:0]  TC; 

tri  ABB_,TS_,TBST_,DBB_,TA_,DPE_; 

supplyl  VDD; 
supplyO  GND; 

//Address  section  wires 

wire  [26:0]  a_reg,NAR; 

wire  [3:0]  ap_reg,addr_parity_gen; 

wire  qual_BG_; 

//Data  section  wires 

wire  [255:0]  data,mux_out; 

wire  [31:0]  dparity,dparity_gen; 

//wire  [3:0]  dreg_clk; 

wire  [1:0]  burst_start; 

wirebs_clk,dreg0_clk,dregl_clk,dreg2_clk,dreg3_clk,data_parity_error,qual_DBG_ 
dregO_clk_buf,dreg  1  _clk_buf ,dreg2_clk_buf ,dreg3_clk_buf ,a_en_buf _,C  ANX, 
dataline_en_buf_,d_enO_buf_,d_en  1  _buf_,d_en2_buf_,d_en3_buf _,ta, 
Iatch0_delay,latchl_delay,latch2_delay,latch3_delay; 

//ADDRESS  BUS  INTERFACE 

assign  qual_BG_  =  ~(ABB_  &  !BG_); 
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//  Next  Address  Register 
dff  #(27 ,0,"  AUTO","  AUTO") 

NextAddressReg(.CLK(NARLatch),.D(NAR_IN),.Q(NAR)); 

//Generate  address  parity. 
parityo_gen32 

AddrParityGen  (.D(  (NAR,GND,GND,GND,GND,GND }  ),.PGEN(addr_parity_gen)); 

//Address  Output  Registers  and  buffers 
dff  #(27,0,"AUTO","AUTO") 

AddressReg(.CLK(a_latch),.D(NAR),.Q(a_reg)); 
dff  #(4,0,"AUTO","AUTO") 

AddrParReg(.CLK(a_latch),.D(addr_parity_gen),.Q(ap_reg)); 
tribuf  #(32,0,"AUTO","  AUTO") 

a_buffer  (.EN(a_en_buf_),.INO( { a_reg,GND.GND,GND,GND,GND  }),.Y( A)); 
stdbuf  #("9")  AEN_BUF  (.INO(a_enJ,.Y(a_en_buf_)); 
tnbuf  #(4,0,"AUTO","AUTO") 

ap_buffer(.EN(a_en_),.INO(ap_reg),.Y(AP)); 
tribuf  #(5,0,"AUTO","AUTO") 

tt_buffer  (.EN(a_en_),.INO( { GND,VDD,VDD, VDD,GND }),.Y(TT)); 
tribuf  #(3,0,"AUTO","AUTO") 

tsize.buffer  (.EN(a_en_),.INO(  { GND,VDD,GND } ),. Y(TSIZ)); 
tribuf  #(2,0,"AUTO","AUTO") 

tcode_buffer  (.EN(a_en  J,.INO({  GND,GND }  ),.Y(TC)); 

stdtribuf  abb_buffer  (.EN(abb_en J,.INO(abb_regJ,.Y(ABB J); 
stdtribuf  tbst_buffer  (.EN(tbst_enJ,.INO(GND),. Y(TBSTJ); 
stdtribuf  ts_buffer  (.EN(ts_enJ,.INO(ts_reg_),.Y(TS J); 


//ADDRESS  FINITE  STATE  MACHINE 

parameter  //  epoch  enum  astat 
A_IDLE        =  3'dO, 
WAIT_FOR_BG    =  3'dl, 
MASTER        =  3'd2, 
TRANSFER      =  3'd3, 
WAIT_FOR_AACK  =  3'd4, 
TERMINATION    =  3'd5, 
WAIT_FOR_NOT_FETCH  =  3'd6. 
dc_astate     =  3'bxx; 

reg  [2:0]  /*  epoch  enum  astat  */  astate,  next_astate; 
reg  a_latch,a_en_,abb_reg_,abb_en_,BR_,NARLatch,snoop_ignore, 
tbst_en_,ts_reg_,ts_en_; 

always  @(posedge  elk  or  negedge  HRESETJ 
begin 
if  (!  HRESETJ 
astate  =  A_IDLE; 
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else 

astate  =  next_ 

astate; 

end 

J  ways  @(  astate  i 

Dr  fetch  or 

qual_BG_ 

or  AACKJ 

begin 

//default  values 

ajatch 

=  1'bO; 

a_en_ 

=  l'bl; 

abb_reg_ 

=  11)1; 

abb_en_ 

=  l'bl; 

BR_ 

=  l'bl; 

NARLatch 

=  1'bO; 

snoop_ignore  =  1'bO; 

tbst_en_ 

=  l'bl; 

ts_reg_ 

=  l'bl; 

ts_en_ 

=  l'bl; 

case  (astate) 

AJTDLE: 
begin 
if  (fetch) 
next_astate  =  WAIT_FOR_BG; 

else  next_astate  =  A_IDLE; 
end 

WAIT_FOR_BG: 

begin 

BR_  =  1'bO;  //  Request  the  bus. 

NARLatch     =  l'bl;  //  Latch  the  Next  Address. 

if(qual_BG_=  1'bO) 
next_astate  =  MASTER; 

else  next_astate  =  WAIT_FOR_BG; 
end 

MASTER: 
begin 

ajatch      =  l'bl;  // Latch  transfer  attributes. 

a_en_        =  1'bO;  // Enable  attribute  outputs. 

abb_reg_     =  1'bO;  //Take  the  address  bus. 

abb_en_      =  1'bO; 

snoop_ignore  =  l'bl;  //Tell  snooper  to  ignore  this  transaction. 

tbst_en_     =  1'bO;  //  Another  transfer  attribute. 

ts_reg_      =  1'bO;  //Start  the  transfer. 

ts_en_       =  1'bO; 

next_astate  =  TRANSFER; 
end 
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TRANSFER: 

begin 

a_en_        =  1'bO; 

abb_reg_      =  1'bO; 

abb_en_       =  1'bO; 

snoop_ignore  =  l'bl: 

tbst_en_     =  1'bO; 

ts_reg_      =  l'bl; 

ts_en_       =  1'bO; 

if(AACK_=l'bl) 
next.astate  =  WAIT_FOR_AACK; 

else  next_astate  =  TERMINATION; 
end 

WAIT_FOR_AACK: 

begin 

a_en_        =  1'bO; 

abb_reg_     =  1'bO; 

abb_en_      =  1'bO; 

snoop_ignore  =  l'bl; 

tbst_en_     =  1'bO; 

if(AACK_=l'bl) 
next_astate  =  WAITJFOR_AACK; 

else  next_astate  =  TERMINATION; 
end 

TERMINATION: 
begin 

abb_reg_     =  l'bl:  //  Relinquish  the  address  bus. 

abb_en_       =  1'bO; 

next_astate  =  WAIT_FOR_NOT_FETCH; 
end 

WAIT_FOR_NOT_FETCH: 
begin 

if  (fetch  =  l'bl) 
next_astate  =  WAIT_FOR_NOT_FETCH; 
else  next_astate  =  AJODLE; 
end 

default: 
begin 

next_astate  =  dc_astate; 
end 

endcase 
end 


//DATA  BUS  INTERFACE 
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assign  qual_DBG_  =  ~(DBB_  &!DBG_); 

//  burst_start  latch 

stdand2  BS_AND  (.INO(send),.INl(clk),.Y(bs_clk)); 

dff  #(2,0,"  AUTO","  AUTO") 

BurstStartReg(.CLK(bs_clk),.D(BURSTSTART),.Q(burst_start)); 

//  Odd  Parity  Generator/Checker 

//  epoch  precompiled  parityo_chkgen256 

parityo_chkgen256  DataParityGen 

(.D(data),.PIN(dpanty),.ERROR(data_parity_error),.PGEN(dparity^en)); 

assign  DPE_  =  ~data_parity_error; 

//data  registers 

stdbufinv  TA_INV  (.INO(TAJ,.Y(ta)); 

//Delay  buffer  required  for  timing  of  latch  signals.  Gates  =  4  results  in  smallest  layout  area. 

stddelaybuf  #(  l,4,"AUTO")  LatchDelayO(.INO(latchO),.Y(latchO_delay)); 

stddelaybuf#(l,4,"AUTO")LatchDelayl(.IN0(latchl),.Y(latchl_delay)); 

stddelaybuf  #(  1,4,"  AUTO")  LatchDelay2(.IN0(latch2),Y(latch2_delay)); 

stddelaybuf  #(  1,4," AUTO")  LatchDelay3(.INO(latch3),.Y(latch3_delay)); 

stdand3  #("CRITICAL") 

DR0_AND(.IN0(clk),.INl(latch0_delay),.IN2(ta),.Y(dreg0_clk)); 
stdand3  #("CRITICAL") 

DRl_AND(.IN0(clk),.INl(latchl_delay),.IN2(ta),.Y(dregl_clk)); 
stdand3  #("CRITICAL") 

DR2_AND(.IN0(clk),.INl(latch2_delay),.IN2(ta),.Y(dreg2_clk)); 
stdand3  #("CRITICAL") 

DR3_AND(.EN0(clk),.INl(latch3_delay),.IN2(ta),.Y(dreg3_clk)); 
stdbuf  #("CRITICAL")  DR0_BUF  (.INO(dregO_clk),.Y(dregO_clk_buf)); 
stdbuf  #("CRITICAL")  DR1_BUF  (.INO(dregl_clk),.Y(dregl_clk_buf)); 
stdbuf  #("CRITICAL")  DR2_BUF  (.IN0(dreg2_clk),.Y(dreg2_clk_buf)); 
stdbuf  #("CRITICAL")  DR3_BUF  (.INO(dreg3_clk),.Y(dreg3_clk_buf)); 
dff  #(72,0,"AUTO","AUTO") 

DataRegO(.CLK(dregO_clk_buf),.D({mux_out[  63:  0],DP}), 
.Q({data[63:  0],dpanty[  7:  0]})); 
dff  #(72,0,"AUTO","AUTO") 

DataRegl  (.CLK(dregl_clk_buf),.D({mux_out[127:  64],DP}), 
.Q({data[127:  64],dparity[15:  8]})); 
dff  #(72,0,"AUTO","AUTO") 

DataReg2  (.CLK(dreg2_clk_buf),.D( { mux_out[  1 9 1 : 1 28] ,DP } ), 
.Q({data[191:128],dparity[23:16]})); 
dff  #(72,0,"AUTO","AUTO") 

DataReg3  (.CLK(dreg3_clk_buf),.D( { mux_out[255: 1 92],DP} ), 
.Q({data[255:192],dparity[31:24]})); 

//multiplexer 

mux2  #(  128,0,"  AUTO","  AUTO") 

MUXA  (.IN0({D,D}),.IN1(DATALINE[127:  0])..S0(mux_sel),.Y(mux_out[127:  0])); 
mux2  #(128,0,"AUTO","AUTO") 

MUXB(.IN0({D,D}),.INl(DATALINE[255:128]),.S0(mux_sel),.Y(mux_out[255:128])); 
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//dataline  output  buffer 

stdbuf  DATALINE_EN_BUFFER  (.INO(dataline_en_),.Y(dataline_en_buf J); 
tribuf  #(  128,0,"  AUTO"."  AUTO") 

dataline_bufferA  (.EN(dataline_en_buf_),.IN0(data[127:  0]), 
.Y(DATALINE[127:  0])); 
tribuf#(128,0,"AUTO","AUTO") 
dataline_buf ferB  (.EN(dataline_en_buf_),.INO(data[255: 128]), 
.Y(DATALINE[255: 128])); 

//data  output  buffers 

tribuf  #(64,0,"AUTO","AUTO") 

data_bufferO  (,EN(d_enO_buf_),.INO(data[  63:  0]),.Y(D)); 
tribuf  #(64,0,"AUTO","  AUTO") 

data_bufferl  (.EN(d_enl_buf_),.IN0(data[127:  64]),.Y(D)); 
tribuf  #(64,0,"AUTO","  AUTO") 

data_buffer2(.EN(d_en2_buf_),.IN0(data[191:128]),.Y(D)); 
tnbuf  #(64,0,"AUTO","  AUTO") 

data_buffer3  (.EN(d_en3_buf_),.IN0(data[255: 192]),.Y(D)); 

stdbuf  DENOJBUF  (.INO(d_enOJ,.Y(d_enO_buf  J) 
stdbuf  DEN1_BUF  (.IN0(d_enl  _),.Y(d_enl_buf J) 
stdbuf  DEN2.BUF  (,IN0(d_en2 J,.Y(d_en2_buf  J) 
stdbuf  DEN3_BUF  (,IN0(d_en3  J,.Y(d_en3_buf  J) 

tribuf  #(8,0,"  AUTO","AUTO") 

dparity_bufferO  (.EN(d_enOJ,.INO(dparity_gen[  7:  0]),.Y(DP)); 
tnbuf  #(8,0."AUTO","AUTO") 

dparity_bufferl  (.EN(d_enlJ,.IN0(dparity_gen[15:  8])..Y(DP)); 
tnbuf  #(8,0,"AUTO","AUTO") 

dparity_buffer2  (.EN(d_en2J,.IN0(dparity_gen[23: 16]),.Y(DP)); 
tribuf  #(8,0,"AUTO","AUTO") 

dparity_buffer3  (.EN(d_en3_),.IN0(dparity_gen[3 1:24]),.Y(DP)); 

stdtribuf  dbb_buffer  (.EN(dbb_enJ,.INO(dbb_regJ,.Y(DBB  J); 
stdtribuf  ta_buffer  (.EN(ta_en  J,.INO(GND),.Y(TA  J); 
stdbuf  #("26")  CANX_BUF  (.INO(cancel),.Y(CANX)); 

//DATA  FINITE  STATE  MACHINE 

parameter  //  epoch  enum  dstat 
D_IDLE  =  5'dO, 

WAIT_FOR_DBG    =5'dl, 
FIRST_BEAT      =  5'd2, 
SECOND_BEAT     =  5'd3, 
THIRD_BEAT      =  5d4, 
FOURTH_BEAT     =  5'd5, 
FETCH_TERMINATE  =  5d6, 
UPLOAD  1  =5'd7, 

ABORT1  =  5d8, 

D_WAIT_FOR_NOT_FETCH_A  =  5d9, 
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D_WAIT_FOR_NOT_FETCH_B  =  5'dlO. 

START_SEND       =5'dl2, 

SENDOO  =  5'dl3, 

SENDOl  =5'dl4, 

SEND02  =5'dl5, 

SEND03  =  5'dl6, 

SENDIO  =  5'dl7, 

SEND11  =5'dl8, 

SEND12  =  5'dl9, 

SEND13  =5'd20, 

SEND20  =  5'd21, 

SEND21  =5'd22, 

SEND22  =  5'd23, 

SEND23  =  5'd24, 

SEND30  =  5'd25, 

SEND31  =5'd26, 

SEND32  =  5'd27, 

SEND33  =  5'd28, 

SEND_TERMINATE  =  5'd29, 

dc_dstate  =  5'bxx; 

reg  [4:0]  /*  epoch  enum  dstat  */ dstate,  next_dstate; 

reg  cancel,dbb_reg_,dbb_en_,dataline_en_,d_enO_,d_enl_,d_en2_,d_en3_ 

do  wnl  oad  ,f etch_d  one, 

fetch_abort,latch0,latchl.latch2,latch3,mux_sel,send_done,upload,ta_en_ 

always  @(posedge  elk  or  negedge  HRESET_) 
begin 
if(!HRESETJ 
dstate  =  D_IDLE: 
else 
dstate  =  next_dstate; 
end 

always  @ (dstate  or  fetch  or  send  or  qual_DBG_  or  TA_  or 
data_parity_error  or  burst_start) 
begin 


//default  values 

cancel       = 

:  1'bO; 

dbb_reg_ 

=  I'bl; 

dbb_en_ 

=  I'bl; 

dataline_en. 

_=  I'bl; 

d_enO_ 

=  I'bl; 

d_enl_ 

=  I'bl; 

d_en2_ 

=  I'bl; 

d_en3_ 

=  I'bl; 

download 

=  1'bO; 

fetch_done    =  1'bO; 
fetch_abort  =  1'bO; 
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latchO 

1'bO; 

latch  1       = 

1'bO; 

latch2 

1'bO; 

latch3 

1'bO; 

mux_sel 

=  1'bO; 

send_done 

=  l'bO 

ta_en_       - 

=  l'bl; 

upload       = 

=  1'bO; 

case  (dstate) 

DJDLE: 

begin 

if  (fetch) 
next_dstate  =  WAIT_FOR_DBG; 

else  if  (send)  next_dstate  =  START_SEND; 

else  next_dstate  =  D_IDLE; 
end 

WAIT_FOR_DBG: 
begin 
if(qual_DBG_=l'bO) 
next_dstate  =  FIRSTJBEAT; 
else  next_dstate  =  WAIT_FOR_DBG; 
end 

FIRST_BEAT: 
begin 
dbb_reg_=  1'bO; 
dbb_en_   =  1'bO; 
latchO=  l'bl; 
if(TA_=l'bl) 
next_dstate  =  FIRST_BEAT; 
else  next_dstate  =  SECOND_BEAT; 
end 

SECOND_BEAT: 

begin 

dbb_reg_=  1'bO; 

dbb_en_   =  1'bO; 

latch  1  =  l'bl; 

if(TA_=l'bl) 
next_dstate  =  SECOND_BEAT; 

else  next_dstate  =  THIRD_BEAT; 
end 

THIRD_BEAT: 

begin 
dbb_reg_=  1'bO; 
dbb_en_    =  1'bO; 
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Iatch2=  l'bl; 
if(TA_=l'bl) 
next_dstate  =  THIRD_BEAT; 
else  next_dstate  =  FOURTH_BEAT; 
end 

FOURTH_BEAT: 
begin 
dbb_reg_=  1'bO; 
dbb_en_   =  1'bO; 
latch3  =  l'bl; 
if(TA_=l'bl) 
next_dstate  =  FOURTH_BEAT; 
else  next_dstate  =  FETCH_TERMINATE; 
end 

FETCHJTERMINATE: 

begin 

dbb_reg_=  l'bl; 

dbb_en_   =  1'bO; 

if  (data_parity_error  ==  l'bl) 
next_dstate  =  ABORT1; 

else  next_dstate  =  UPLOAD  1; 
end 

UPLOAD  1: 
begin 

dataline_en_  =  1'bO; 

fetch_done  =  l'bl; 

upload  =  l'bl; 

next_dstate  =  D_WAIT_FOR_NOT_FETCH_A; 
end 

ABORT  1: 
begin 

fetch_abort=  l'bl; 

next_dstate  =  D_WA1T_F0R_N0T_FETCH_B; 
end 

D_WA1T_F0R_N0T_FETCH_A: 

begin 
dataline_en_  =  1'bO;  //  To  meet  data  hold  requirements  of  hsram 

//in  Data  List. 
fetch_done  =  l'bl; 
if  (fetch  ==  l'bl) 

next_dstate  =  D_WAIT_FOR_NOT_FETCH_A; 
else  next_dstate  =  D_EDLE; 
end 

D  WAIT  FOR  NOT  FETCH_B: 
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begin 

fetch_abort  =  I'M; 

if  (fetch  ==  I'M) 
next_dstate  =  D_WAIT_FOR_NOT_FETCH_B; 

else  next_dstate  =  D_IDLE; 
end 

START_SEND: 
begin 

cancel  =  l'bl; 

download  =  l'bl; 

latchO=l'bl; 

latchl  =  l'bl; 

latch2  =  l'bl; 

latch3=  l'bl; 

mux_sel  =  l'bl; 

if  (burst_start  =  2'dO)  next_dstate  =  SENDOO; 

else  if  (burst_start  ==  2'dl)  next_dstate  =  SEND1 1; 

else  if  (burst_start  =  2'd2)  next_dstate  =  SEND22; 

else  if  (burst_start  =  2'd3)  next_dstate  =  SEND33; 

else  next_dstate  =  START_SEND; 
end 

SENDOO: 
begin 
ta_en_    =  1'bO; 
d_enO_=  1'bO; 
next_dstate  =  SENDOl; 
end 

SENDOl: 
begin 

ta_en_  =  1'bO; 

d_enl_=l'bO; 

next_dstate  =  SEND02; 
end 

SEND02: 
begin 

ta_en_=  1'bO; 

d_en2_=  1'bO; 

next_dstate  =  SEND03; 
end 

SEND03: 
begin 
ta_en_=  1'bO; 
d_en3_=l'bO; 
send_done  =  l'bl; 
next_dstate  =  SEND  TERMINATE; 
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end 

SEND11: 
begin 

ta_en_=  1'bO; 

d_enl_=l'bO; 

next_dstate  =  SEND  12; 
end 

SEND12: 
begin 

ta_en_=  1'bO; 

d_en2_=l'b0; 

next_dstate  =  SEND13; 
end 

SEND  13: 
begin 

ta_en_=  1'bO; 

d_en3_=  1'bO; 

next_dstate  =  SEND10; 
end 

SEND  10: 
begin 

ta_en_=  1'bO; 

d_enO_=l'bO; 

send_done  =  l'bl; 

next_dstate  =  SEND_TERJVHNATE; 
end 

SEND22: 
begin 

ta_en_  =  1  'bO; 

d_en2_=l'b0; 

next_dstate  =  SEND23; 
end 

SEND23: 
begin 

ta_en_=  1'bO; 

d_en3_=l'b0; 

next_dstate  =  SEND20; 
end 

SEND20: 

begin 
ta_en_=  1'bO; 
d_enO_=l'bO; 
next  dstate  =  SEND21; 
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end 

SEND21: 
begin 

ta_en_=  1'bO; 

d_enl_=l'bO; 

send_done  =  l'bl; 

next_dstate  =  SEND_TERMINATE; 
end 

SEND33: 
begin 
ta_en_=  1'bO; 
d_en3_=l'b0; 
next_dstate  =  SEND30; 
end 

SEND30: 
begin 
ta_en_=  1'bO; 
d_enO_=l'bO; 
next_dstate  =  SEND31; 
end 

SEND31: 
begin 

ta_en_=  1'bO; 

d_enl_=  1'bO; 

next_dstate  =  SEND32; 
end 

SEND32: 
begin 

ta_en_=  1'bO; 

d_en2_=l'b0; 

send_done=  l'bl; 

next_dstate  =  SEND_TERMINATE; 
end 

SEND_TERMINATE: 
begin 

next_dstate  =  D_IDLE; 
end 

default: 
begin 

next_dstate  =  dc_dstate; 
end 

endcase 
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end 
endmodule 


1.    Odd  Parity  Checker/Generator  With  256  Inputs 


*  ODD  PARITY  CHECKER  AND  GENERATOR 

*  Filename:  parityo_chkgen256.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     29FEB96 

*  Revised:  29FEB96 

* 

Purpose:  This  module  checks  the  parity  of  the  input  data,  comparing  it  to  the  input  parity.  Parity  is  odd  including 
the  parity  bit.  This  module  also  generates  the  parity  for  the  input  data  in  groups  of  eight  input  bits. 


module  parityo_chkgen256  (D,PINERROR,PGEN); 

//epoch  set_attribute FTXEDBLOCK  =  1 

input  [255:0]  D; 
input  [31:0]  PIN; 
output  [31:0]  PGEN; 
output  ERROR; 

wire  ERROR_0ERROR_l  ERROR_2,ERROR_3,ERROR; 

parityo_chk64  parity_group_0 

(.D(D[  63:  0]),.PIN(PIN[  7:  0]),ERROR(ERROR_0),.PGEN(PGEN[  7:  0])); 
parityo_chk64  parity_group_l 

(.D(D[127:  64]),.PIN(PIN[15:  8]),.ERROR(ERROR_l),.PGEN(PGEN[15:  8])); 
parityo_chk64  parity_group_2 

(.D(D[191:128]),.PIN(PIN[23:16]),.ERROR(ERROR_2),.PGEN(PGEN[23:16])); 
parityo_chk64  parity_group_3 

(.D(D[255:192]),.PIN(PIN[31:24]),.ERROR(ERROR_3),.PGEN(PGEN[31:24])); 

stdor4  OR_A  (ERROR_0ERROR_l,ERROR_2ERROR_3ERROR); 

endmodule 

module  parityo_chk64  (D,PINERROR,PGEN); 
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input  [63:0]  D; 
input  [7:0]  PIN; 
output  [7:0]  PGEN; 
output  ERROR; 

wireERROR_0,ERROR_l,ERROR_2,ERROR_3,ERROR_4,ERROR_5,ERROR_6,ERROR_7,ERROR_A, 
ERROR_B,ERROR; 

pantycgo  #(8,0,"AUTO","1") 

parity_group_0  (.D(D[  7:  0]),.PIN(PIN[0])..ERROR(ERROR_0),.PGEN(PGEN[0])); 
pantycgo  #(8,0,"AUTO","  1 ") 

parity_group_l  (.D(D[15:  8]),.PIN(PIN[l]),.ERROR(ERROR_r),.PGEN(PGEN[l])); 
pantycgo  #(8,0."AUTO","1") 

panty_group_2(.D(D[23:16]),.PIN(PIN[2]),ERROR(ERROR_2),.PGEN(PGEN[2])); 
pantycgo  #(8,0,'AUTO","1") 

panty_group_3(.D(D[31:24]),.PIN(PIN[3]),.ERROR(ERROR_3),.PGEN(PGEN[3])); 
paritycgo  #(8,0,'AUTO","1") 

panty_group_4(.D(D[39:32]),.PIN(PIN[4]),.ERROR(ERROR_4),.PGEN(PGEN[4])); 
paritycgo  #(8,0,"AUTO","1") 

parity_group_5(.D(D[47:40]),.PIN(PIN[5]),.ERROR(ERROR_5),.PGEN(PGEN[5])); 
paritycgo  #(8,0,"AUTO","1") 

panty_group_6(.D(D[55:48]),.PIN(PIN[6]),.ERROR(ERROR_6),.PGEN(PGEN[6])); 
paritycgo  #(8,0,"AUTO","1") 

parity_group_7(.D(D[63:56]),.PIN(PrN[7]),.ERROR(ERROR_7),.PGEN(PGEN[7])); 

stdor4  OR_A  (ERROR_0,ERROR_1,ERROR_2,ERROR_3,ERROR_A); 
stdor4  OR_B  (ERROR_4.ERROR_5,ERROR_6.ERROR_7,ERROR_B); 
stdor2  OR_C  (ERROR_A,ERROR_B.ERROR); 

endmodule 


2 .    Odd  Parity  Generator  With  32  Inputs 


*  ODD  PARITY  GENERATOR 

*  Filename:  parityo_gen32.v 

*  Author:  Joseph  R.  Robert,  Jr. 

*  Date:     12FEB96 

*  Revised:  29FEB96 

Purpose:  This  module  generates  odd  parity  bits  for  group  of  eight  inputs. 

module  parityo_gen32  (DJPGEN); 

input  [31:01  D; 
output  [3:0]  PGEN; 
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wire  [3:0]  PGEN; 

parityo  #(8,0,"AUTO","1")  parity_group_0  (.D(D[  7:  0]),.PGEN(PGEN[0])); 
parityo#(8,0,"AUTO","l")  parity_group_l  (.D(D[15:  8]),.PGEN(PGEN[1])); 
parityo  #(8,0,"AUTO","1")  parity_group_2  (.D(D[23:16]),.PGEN(PGEN[2])); 
parityo  #(8,0,"AUTO","1")  parity_group_3  (.D(D[31:24]),.PGEN(PGEN[3])); 

endmodule 


H.     TEST  RESULTS 


Host  command:  verilog 
Command  arguments: 
-f  verilog_arguments 

-v/tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v 

prc.v 

prc_top.v 

sequencer4.v 

tarbiter.v 

tcpu.v 

testbench.v 

tmemory.v 

VERILOG-XL  2.1.2  log  file  created  Mar  19,  1996  11:53:03 
VERILOG-XL  2.1.2    Mar  19,  1996  11:53:03 

Copyright  (c)  1994  Cadence  Design  Systems,  Inc.  All  Rights  Reserved. 
Unpublished  —  rights  reserved  under  the  copyright  laws  of  the  United  States. 

Copyright  (c)  1994  UNIX  Systems  Laboratories,  Inc.  Reproduced  with  Permission. 

THIS  SOFTWARE  AND  ON-LINE  DOCUMENTATION  CONTAIN  CONFIDENTIAL  INFORMATION 
AND  TRADE  SECRETS  OF  CADENCE  DESIGN  SYSTEMS,  INC.  USE,  DISCLOSURE,  OR 
REPRODUCTION  IS  PROHIBITED  WITHOUT  THE  PRIOR  EXPRESS  WRITTEN  PERMISSION  OF 
CADENCE  DESIGN  SYSTEMS,  INC. 
RESTRICTED  RIGHTS  LEGEND 

Use,  duplication,  or  disclosure  by  the  Government  is  subject  to 
restrictions  as  set  forth  in  subparagraph  (c)(1)(h)  of  the  Rights  in 
Technical  Data  and  Computer  Software  clause  at  DFARS  252.227-7013  or 
subparagraphs  (c)(1)  and  (2)  of  Commercial  Computer  Software  —  Restricted 
Rights  at  48  CFR  52.227-19,  as  applicable. 

Cadence  Design  Systems,  Inc. 
555  River  Oaks  Parkway 
San  Jose,  California  95134 
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For  technical  assistance  please  contact  the  Cadence  Response  Center  at 
1 -800-CADENC2  or  send  email  to  crc_customers@cadence.com 

For  more  information  on  Cadence's  Verilog-XL  product  line  send  email  to 
talkverilog@cadence.com 

Compiling  source  file  "prc.v" 

Compiling  source  file  "prc_top.v" 

Compiling  source  file  "sequencer4.v" 

Compiling  source  file  "tarbiter.v" 

Compiling  source  file  "tcpu.v" 

Compiling  source  file  "testbench.v" 

Compiling  source  file  "tmemory.v" 

Scanning  library  file  '7tmp_rrmt/lVjoshua_u2/jrrobert/mesis/epoclVprimlib.v'' 

Scanning  library  file  '7tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v'' 

Warning!  Implicit  wire  has  no  fanin  [Verilog-rWFA] 

"prc.v",  23 159:  NCO 

Warning!  Implicit  wire  has  no  fanin  [Verilog-rWFA] 

"prc.v",  23159:  NCI 

Warning!  Implicit  wire  has  no  fanin  rVerilog-IWFA] 

"prc.v",  23159:  NCO 

Warning!  Implicit  wire  has  no  fanin  rVerilog-IWFA] 

"prc.v",  23159:  NCI     ■ 
Highest  level  modules: 
testbench 

***  SDF  Annotator  version  1.6_beta.3 

***     SDF  file:  /unp_mnt/h/joshua_u2/jrrobert/thesis/verilog/hardware/prc.sdf 
Back-annotation  scope:  testbench.PRCl.PRCl 
No  configuration  file  specified  -  using  default  options 

***    SDF  Annotator  log  file:  sdf.log 

***    No  MTM  selection  parameter  specified 

***     No  SCALE  FACTORS  parameter  specified 
No  SCALE  TYPE  parameter  specified 
Configuring  for  back-annotation... 
Reading  SDF  file  and  back-annotating  amino  data... 


***  SDF  back-annotation  successfully  completed 
PRC  granted  the  data  bus. 

(ERROR):  WR  and  A  are  both  unknown  at  time  6.700 
(ERROR):  WR  and  A  are  both  unknown  at  time  6.700 


*** 
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(ERROR):  WR  and  A  are  both  unknown  at  time  6.700 

(ERROR)  WR  transition  to  unknown  and  (din  !=  MEM[a])  at  time  7.000 

Instance:  testbench.PRC  1  .PRC 1  .LM 1  .MRMA_list.hsram.inst  1 
(ERROR)  WR  transition  to  unknown  and  (din  !=  MEM[a])  at  time  7.000 

Instance:  testbench.PRC  1  .PRC  1  .DL 1  .data_ram  1  .hsram.inst  1 
(ERROR)  WR  transition  to  unknown  and  (din  !=  MEM[a])  at  time  7.000 

Instance:  testbench.PRCl.PRCl.DLl.data_ram0.hsram.instl 
System  hard  reset  at  time  35. 

CPU  started  read  from  address  00000000  at  time  45. 

CPU  read:  0001020304050607  at  211 

CPU  read:  08090a0b0c0d0e0f  at  27 1 

CPU  read:  1011121314151617  at  331 

PRC  requested  the  bus. 

CPU  read:  18191alblcldlelf  at  391 

CPU  started  read  from  address  00000020  at  time  420. 

CPU  read:  2021222324252627  at  556 

CPU  read:  28292a2b2c2d2e2f  at  616 

CPU  read:  3031323334353637  at  676 

CPU  read:  38393a3b3c3d3e3f  at  736 

PRC  granted  the  data  bus. 
CPU  started  read  from  address  00000180  at  time  1215. 

CPU  read:  0001020304050607  at  1381 

CPU  read:  08090a0b0c0d0e0f  at  144 1 

CPU  read:  1011121314151617  at  1501 

CPU  read:  18191alblcldlelf  at  1561 

CPU  started  read  from  address  000001  aO  at  time  1665. 

CPU  read:  2021222324252627  at  1831 

PRC  requested  the  bus. 

CPU  read:  28292a2b2c2d2e2f  at  1891 

CPU  read:  3031323334353637  at  1951 

CPU  read:  38393a3b3c3d3e3f  at  201 1 

PRC  granted  the  data  bus. 
CPU  started  read  from  address  00000040  at  time  2490. 

CPU  read:  4041424344454647  at  2641 

CPU  read:  404 1424344454647  at  2656 

CPU  read:  505 1 525354555657  at  267 1 

CPU  read:  4041424344454647  at  2686 

PRC  requested  the  bus. 
PRC  granted  the  data  bus. 
CPU  started  write  to  address  OOOOOlcO  at  time  3307. 

CPU  write  beat  1:  7777777777777777  at  3322 

CPU  write  beat  2:  8888888888888888  at  3488 

CPU  write  beat  3:  1111111111111111  at  3548 

CPU  write  beat  4:  3333333333333333  at  3608 

CPU  started  read  from  address  00000060  at  time  3765. 

CPU  read:  606 1 626364656667  at  3916 

CPU  read:  6061626364656667  at  3931 

CPU  read:  7071727374757677  at  3946 

CPU  read:  6061626364656667  at  3961 

PRC  requested  the  bus. 
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PRC  granted  the  data  bus. 

CPU  started  read  from  address  OOOOO  IcO  at  time  4440. 

CPU  read:  7777777777777777  at  4606 

CPU  read:  8888888888888888  at  4666 

CPUread:  1111111111111111  at  4726 

CPU  read:  3333333333333333  at  4786 

L125  "testbench.v":  Sfinish  at  simulation  time  5035000 
4  warnings 

158647  simulation  events  +  266655  accelerated  events  +  926440  timing  check  events 
CPU  time:  6.1  sees  to  compile  +  161.8  sees  to  link  +  377.5  sees  in  simulation 
End  of  VERILOG-XL  2.1.2    Mar  19,  1996  12:15:44 
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